| Article ID: | iaor20018 |
| Country: | United States |
| Volume: | 14 |
| Issue: | 2 |
| Start Page Number: | 243 |
| End Page Number: | 258 |
| Publication Date: | Jan 2000 |
| Journal: | Probability in the Engineering and Informational Sciences |
| Authors: | Borkar V.S. |
A simulation-based algorithm for learning good policies for a discrete-time stochastic control process with unknown transition law is analyzed when the state and action spaces are compact subsets of Euclidean spaces. This extends the Q-learning scheme of discrete state/action problems along the lines of Baker. Almost sure convergence is proved under suitable conditions.