A weighted Markov decision process

A weighted Markov decision process

0.00 Avg rating0 Votes
Article ID: iaor19931485
Country: United States
Volume: 40
Issue: 6
Start Page Number: 1180
End Page Number: 1187
Publication Date: Nov 1992
Journal: Operations Research
Authors: , ,
Keywords: decision theory
Abstract:

The two most commonly considered reward criteria for Markov decision processes are the discounted reward and the long-term average reward. The first tends to ‘neglect’ the future, concentrating on the short-term rewards, while the second one tends to do the opposite. The authors consider a new reward criterion consisting of the weighted combination of these two criteria, thereby allowing the decision maker to place more or less emphasis on the short-term versus the long-term rewards by varying their weights. The mathematical implications of the new criterion include: the deterministic stationary policies can be outperformed by the randomized stationary policies, which in turn can be outperformed by the nonstationary policies; an optimal policy might not exist. The authors present an iterative algorithm for computing an -optimal nonstationary policy with a very simple structure.

Reviews

Required fields are marked *. Your email address will not be published.