Randomized and past-dependent policies for Markov decision processes with multiple constraints

Randomized and past-dependent policies for Markov decision processes with multiple constraints

0.00 Avg rating0 Votes
Article ID: iaor1989298
Country: United States
Volume: 37
Issue: 3
Start Page Number: 474
End Page Number: 477
Publication Date: May 1989
Journal: Operations Research
Authors:
Abstract:

The Markov decision problem of locating a policy to maximize the long-run average reward subject to K long-run average cost constraints is considered. It is assumed that the state and action spaces are finite and the law of motion is unichain, that is, every pure policy gives rise to a Markov chain with one recurrent class. It is first proved that there exists an optimal stationary policy with a degree of randomization no greater than K; consequently, it is never necessary to randomize in more than K states. A linear program produces the optimal policy with limited randomization. For the special case of a single constraint, the paper also addresses the problem of finding optimal nonrandomized, but nonstationary, policies. It shows that a round-robin type policy is optimal, and conjectures the same for a steering policy that depends on the entire past history of the process, but whose implementation requires essentially no more storage than that of a pure policy.

Reviews

Required fields are marked *. Your email address will not be published.