Article ID: | iaor19931485 |
Country: | United States |
Volume: | 40 |
Issue: | 6 |
Start Page Number: | 1180 |
End Page Number: | 1187 |
Publication Date: | Nov 1992 |
Journal: | Operations Research |
Authors: | Filar Jerzy A., Krass Dmitry, Sinha Sagnik S. |
Keywords: | decision theory |
The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to ‘neglect’ the future, concentrating on the short-term rewards, while the second one tends to do the opposite. The authors consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in turn can be outperformed by the nonstationary policies; an optimal policy might not exist. The authors present an iterative algorithm for computing an