Article ID: | iaor20018 |
Country: | United States |
Volume: | 14 |
Issue: | 2 |
Start Page Number: | 243 |
End Page Number: | 258 |
Publication Date: | Jan 2000 |
Journal: | Probability in the Engineering and Informational Sciences |
Authors: | Borkar V.S. |
A simulation-based algorithm for learning good policies for a discrete-time stochastic control process with unknown transition law is analyzed when the state and action spaces are compact subsets of Euclidean spaces. This extends the Q-learning scheme of discrete state/action problems along the lines of Baker. Almost sure convergence is proved under suitable conditions.