An algorithm to identify and compute average optimal policies in multichain Markov decision processes

An algorithm to identify and compute average optimal policies in multichain Markov decision processes

0.00 Avg rating0 Votes
Article ID: iaor20072035
Country: United States
Volume: 28
Issue: 3
Start Page Number: 553
End Page Number: 586
Publication Date: Aug 2003
Journal: Mathematics of Operations Research
Authors:
Abstract:

This paper concerns discrete-time, finite state multichain MDPs with compact action sets. The optimality criterion is long-run average cost. Simple examples illustrate that optimal stationary Markov policies do not always exist. We establish the existence of ϵ‐optimal policies that are stationary Markovian, and develop an algorithm that computes these approximate optimal policies. We establish a necessary and sufficient condition for the existence of an optimal policy that is stationary Markovian, and in case that such an optimal policy exists the algorithm computes it.

Reviews

Required fields are marked *. Your email address will not be published.