Article ID: | iaor19931525 |
Country: | United States |
Volume: | 17 |
Issue: | 4 |
Start Page Number: | 921 |
End Page Number: | 931 |
Publication Date: | Nov 1992 |
Journal: | Mathematics of Operations Research |
Authors: | Maitra A., Sudderth W. |
Keywords: | programming: markov decision |
The authors consider the negative dynamic programming model of Strauch and prove that the optimal reward function can be obtained by a transfinite iteration of the optimal reward operator. They show that a player loses nothing by being restricted to measurable policies, if the returns from nonmeasurable policies are evaluated by lower integrals.