Multichain Markov decision processes with a sample path constraint: A decomposition approach

Multichain Markov decision processes with a sample path constraint: A decomposition approach

0.00 Avg rating0 Votes
Article ID: iaor19911749
Country: United States
Volume: 16
Issue: 1
Start Page Number: 195
End Page Number: 207
Publication Date: Feb 1991
Journal: Mathematics of Operations Research
Authors: ,
Abstract:

The authors consider finite-state finite-action Markov decision processes which accumulate both a reward and a cost at each decision epoch. They study the problem of finding a policy that maximizes the expected long-run average reward subject to the constraint that the long-run average cost be no greater than a given value with probability one. The authors establish that if there exists a policy that meets the constraint, then there exists an •-optimal stationary policy. Furthermore, an algorithm is outlined to locate the •-optimal stationary policy. The proof of the result hinges on a decomposition of the state space into maximal recurrent classes and a set of transient states.

Reviews

Required fields are marked *. Your email address will not be published.