| Article ID: | iaor2004695 |
| Country: | United States |
| Volume: | 28 |
| Issue: | 1 |
| Start Page Number: | 194 |
| End Page Number: | 200 |
| Publication Date: | Feb 2003 |
| Journal: | Mathematics of Operations Research |
| Authors: | Golubin A.Y. |
| Keywords: | programming: dynamic |
The undiscounted, unichain, finite state Markov decision process with compact action space is studied. We provide a counterexample for a result in Hordijk and Puterman and give an alternate proof of the convegence of policy iteration under the condition that there exists a state that is recurrent under every stationary policy. The analysis essentially uses a two-term matrix representation for the relative value vectors generated by policy iteration procedure.