Article ID: | iaor20013614 |
Country: | Germany |
Volume: | 52 |
Issue: | 1 |
Start Page Number: | 133 |
End Page Number: | 167 |
Publication Date: | Jan 2000 |
Journal: | Mathematical Methods of Operations Research (Heidelberg) |
Authors: | Montes-de-Oca R., Cavazos-Cadena R. |
Keywords: | markov processes |
This note concerns Markov decision processes on a discrete state space. It is supposed that the reward function is nonnegative, and that the decision maker has a nonnull constant risk-sensitivity, which leads to grade random rewards via the expectation of an exponential utility function. The performance index is the risk-sensitive expected-total reward criterion, and the existence of approximately optimal stationary policies, in the absolute and relative senses, is studied. The main results, derived under mild conditions, extend classical theorems in risk-neutral positive dynamic programming and can be summarized as follows: Assuming that the optimal value function is finite, it is proved that (i) ϵ-optimal stationary policies exist when the state and action spaces are both finite, and (ii) this conclusion is extended to the denumerable state space case whenever (a) the decision maker is risk-averse, and (b) the optimal value function is bounded. This latter result is a (weak) risk-sensitive version of a classical theorem formulated by Ornstein (1969) .