| Article ID: | iaor200954193 |
| Country: | United States |
| Volume: | 33 |
| Issue: | 4 |
| Start Page Number: | 880 |
| End Page Number: | 898 |
| Publication Date: | Nov 2008 |
| Journal: | Mathematics of Operations Research |
| Authors: | Basu Arnab, Bhattacharyya Tirthankar, Borkar Vivek S |
| Keywords: | learning |
A linear function approximation–based reinforcement learning algorithm is proposed for Markov decision processes with infinite horizon risk–sensitive cost. Its convergence is proved using the “ODE” (Ordinary Differential Equation ) method for stochastic approximation. The scheme is also extended to continuous state space processes.