On the convergence of stochastic iterative dynamic-programming algorithms

0.00 Avg rating—0 Votes

Article ID:	iaor1999869
Country:	United States
Volume:	6
Issue:	6
Start Page Number:	1185
End Page Number:	1201
Publication Date:	Nov 1994
Journal:	Neural Computation
Authors:	Singh S.P., Jaakkola T., Jordan M.I.
Keywords:	programming: dynamic, artificial intelligence

Abstract:

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(λ) algorithm of Sutton and the Q-learning algorithm of Watkins, can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(λ) and Q-learning belong.

Reviews

Required fields are marked *. Your email address will not be published.