Article ID: | iaor200954193 |
Country: | United States |
Volume: | 33 |
Issue: | 4 |
Start Page Number: | 880 |
End Page Number: | 898 |
Publication Date: | Nov 2008 |
Journal: | Mathematics of Operations Research |
Authors: | Basu Arnab, Bhattacharyya Tirthankar, Borkar Vivek S |
Keywords: | learning |
A linear function approximation–based reinforcement learning algorithm is proposed for Markov decision processes with infinite horizon risk–sensitive cost. Its convergence is proved using the “ODE” (Ordinary Differential Equation ) method for stochastic approximation. The scheme is also extended to continuous state space processes.