On the convergence of stochastic iterative dynamic-programming algorithms

On the convergence of stochastic iterative dynamic-programming algorithms

0.00 Avg rating0 Votes
Article ID: iaor1999869
Country: United States
Volume: 6
Issue: 6
Start Page Number: 1185
End Page Number: 1201
Publication Date: Nov 1994
Journal: Neural Computation
Authors: , ,
Keywords: programming: dynamic, artificial intelligence
Abstract:

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(λ) algorithm of Sutton and the Q-learning algorithm of Watkins, can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(λ) and Q-learning belong.

Reviews

Required fields are marked *. Your email address will not be published.