| Article ID: | iaor20072029 |
| Country: | United States |
| Volume: | 28 |
| Issue: | 4 |
| Start Page Number: | 752 |
| End Page Number: | 776 |
| Publication Date: | Nov 2003 |
| Journal: | Mathematics of Operations Research |
| Authors: | Cavazos-Cadena Rolando, Montes-De-Oca Ral |
| Keywords: | programming: dynamic |
This work concerns discrete-time Markov decision chains with finite state space and bounded costs. The controller has constant risk sensitivity λ, and the performance of a control policy is measured by the corresponding risk-sensitive average cost criterion. Assuming that the optimality equation has a solution, it is shown that the value iteration scheme can be implemented to obtain, in a finite number of steps, (1) an approximation to the optimal λ-sensitive average cost with an error less than a given tolerance, and (2) a stationary policy whose performance index is arbitrarily close to the optimal value. The argument used to establish these results is based on a modification of the original model, which is an extension of a transformation introduced by Schweitzer to analyze the risk-neutral case.