Multichain Markov decision processes with a sample path constraint: A decomposition approach

0.00 Avg rating—0 Votes

Article ID:	iaor19911749
Country:	United States
Volume:	16
Issue:	1
Start Page Number:	195
End Page Number:	207
Publication Date:	Feb 1991
Journal:	Mathematics of Operations Research
Authors:	Ross Keith W., Varadarajan Ravi

Abstract:

The authors consider finite-state finite-action Markov decision processes which accumulate both a reward and a cost at each decision epoch. They study the problem of finding a policy that maximizes the expected long-run average reward subject to the constraint that the long-run average cost be no greater than a given value with probability one. The authors establish that if there exists a policy that meets the constraint, then there exists an •-optimal stationary policy. Furthermore, an algorithm is outlined to locate the •-optimal stationary policy. The proof of the result hinges on a decomposition of the state space into maximal recurrent classes and a set of transient states.

Reviews

Required fields are marked *. Your email address will not be published.