| Article ID: | iaor19931485 |
| Country: | United States |
| Volume: | 40 |
| Issue: | 6 |
| Start Page Number: | 1180 |
| End Page Number: | 1187 |
| Publication Date: | Nov 1992 |
| Journal: | Operations Research |
| Authors: | Filar Jerzy A., Krass Dmitry, Sinha Sagnik S. |
| Keywords: | decision theory |
The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to ‘neglect’ the future, concentrating on the short-term rewards, while the second one tends to do the opposite. The authors consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in turn can be outperformed by the nonstationary policies; an optimal policy might not exist. The authors present an iterative algorithm for computing an