| Article ID: | iaor19931525 |
| Country: | United States |
| Volume: | 17 |
| Issue: | 4 |
| Start Page Number: | 921 |
| End Page Number: | 931 |
| Publication Date: | Nov 1992 |
| Journal: | Mathematics of Operations Research |
| Authors: | Maitra A., Sudderth W. |
| Keywords: | programming: markov decision |
The authors consider the negative dynamic programming model of Strauch and prove that the optimal reward function can be obtained by a transfinite iteration of the optimal reward operator. They show that a player loses nothing by being restricted to measurable policies, if the returns from nonmeasurable policies are evaluated by lower integrals.