Variability sensitive Markov Decision Processes

0.00 Avg rating—0 Votes

Article ID:	iaor1993746
Country:	United States
Volume:	17
Issue:	3
Start Page Number:	558
End Page Number:	571
Publication Date:	Aug 1992
Journal:	Mathematics of Operations Research
Authors:	Baykal-Gursoy Melike, Ross K.W.

Abstract:

Considered are time-average Markov Decision Processes (MDPs) with finite state and action spaces. Two definitions of variability are introduced, namely, the expected time-average variability and time-average expected variability. The two criteria are in general different, although they can both be employed to penalize for variance in the stream of rewards. For communicating MDPs, the authors construct a (randomized) stationary policy that is ∈-optimal for both criteria; the policy is optimal and pure for a specific variability function. For general multichain MDPs, a state space decomposition leads to a similar result for the expected time-average variability. The authors also consider the problem of the decision maker choosing the initial state along with the policy.

Reviews

Required fields are marked *. Your email address will not be published.