| Article ID: | iaor20023453 |
| Country: | United States |
| Volume: | 15 |
| Issue: | 4 |
| Start Page Number: | 557 |
| End Page Number: | 564 |
| Publication Date: | Jan 2001 |
| Journal: | Probability in the Engineering and Informational Sciences |
| Authors: | Montes-de-Oca R., Cavazos-Cadena R. |
| Keywords: | markov processes |
This article concerns Markov decision chains with finite state and action spaces, and a control policy is graded via the expected total-reward criterion associated to a nonnegative reward function. Within this framework, a classical theorem guarantees the existence of an optimal stationary policy whenever the optimal value function is finite, a result that is obtained via a limit process using the discounted criterion. The objective of this article is to present an alternative approach, based entirely on the properties of the expected total-reward index, to establish such an existence result.