| Article ID: | iaor1999815 |
| Country: | United States |
| Volume: | 16 |
| Issue: | 3 |
| Start Page Number: | 185 |
| End Page Number: | 202 |
| Publication Date: | Jul 1994 |
| Journal: | Machine Learning |
| Authors: | Tsitsiklis J.N. |
| Keywords: | programming: dynamic, markov processes |
We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.