A weighted Markov decision process

0.00 Avg rating—0 Votes

Article ID:	iaor19931485
Country:	United States
Volume:	40
Issue:	6
Start Page Number:	1180
End Page Number:	1187
Publication Date:	Nov 1992
Journal:	Operations Research
Authors:	Filar Jerzy A., Krass Dmitry, Sinha Sagnik S.
Keywords:	decision theory

Abstract:

The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to ‘neglect’ the future, concentrating on the short-term rewards, while the second one tends to do the opposite. The authors consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in turn can be outperformed by the nonstationary policies; an optimal policy might not exist. The authors present an iterative algorithm for computing an ∈-optimal nonstationary policy with a very simple structure.

Reviews

Required fields are marked *. Your email address will not be published.