Article ID: | iaor2004695 |
Country: | United States |
Volume: | 28 |
Issue: | 1 |
Start Page Number: | 194 |
End Page Number: | 200 |
Publication Date: | Feb 2003 |
Journal: | Mathematics of Operations Research |
Authors: | Golubin A.Y. |
Keywords: | programming: dynamic |
The undiscounted, unichain, finite state Markov decision process with compact action space is studied. We provide a counterexample for a result in Hordijk and Puterman and give an alternate proof of the convegence of policy iteration under the condition that there exists a state that is recurrent under every stationary policy. The analysis essentially uses a two-term matrix representation for the relative value vectors generated by policy iteration procedure.