| Article ID: | iaor20051045 |
| Country: | China |
| Volume: | 25 |
| Issue: | 5 |
| Start Page Number: | 377 |
| End Page Number: | 380 |
| Publication Date: | Sep 2003 |
| Journal: | Journal of Yunnan University |
| Authors: | Hu Guanghua |
A stochastic gradient algorithm for average reward of the Markov decision processes that depends on a parameter vector is proposed. A new gradient for the objective function is given and a stochastic approximation algorithm that is based on a single sample path is presented. Finally, a proof of convergence of the gradient (with probability 1) is provided.