Article ID: | iaor20051045 |
Country: | China |
Volume: | 25 |
Issue: | 5 |
Start Page Number: | 377 |
End Page Number: | 380 |
Publication Date: | Sep 2003 |
Journal: | Journal of Yunnan University |
Authors: | Hu Guanghua |
A stochastic gradient algorithm for average reward of the Markov decision processes that depends on a parameter vector is proposed. A new gradient for the objective function is given and a stochastic approximation algorithm that is based on a single sample path is presented. Finally, a proof of convergence of the gradient (with probability 1) is provided.