A note on the convergence of policy iteration in Markov decision processes with compact action spaces

0.00 Avg rating—0 Votes

Article ID:	iaor2004695
Country:	United States
Volume:	28
Issue:	1
Start Page Number:	194
End Page Number:	200
Publication Date:	Feb 2003
Journal:	Mathematics of Operations Research
Authors:	Golubin A.Y.
Keywords:	programming: dynamic

Abstract:

The undiscounted, unichain, finite state Markov decision process with compact action space is studied. We provide a counterexample for a result in Hordijk and Puterman and give an alternate proof of the convegence of policy iteration under the condition that there exists a state that is recurrent under every stationary policy. The analysis essentially uses a two-term matrix representation for the relative value vectors generated by policy iteration procedure.

Reviews

Required fields are marked *. Your email address will not be published.