Article ID: | iaor1999815 |
Country: | United States |
Volume: | 16 |
Issue: | 3 |
Start Page Number: | 185 |
End Page Number: | 202 |
Publication Date: | Jul 1994 |
Journal: | Machine Learning |
Authors: | Tsitsiklis J.N. |
Keywords: | programming: dynamic, markov processes |
We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.