Article ID: | iaor1989725 |
Country: | Netherlands |
Volume: | 8 |
Start Page Number: | 201 |
End Page Number: | 203 |
Publication Date: | Apr 1989 |
Journal: | Operations Research Letters |
Authors: | Chung Kun-Jen |
A stationary policy in an MDP (Markov decision process) induces a stationary probability distribution of the reward from each initial state. This note is related to the problem of maximizing the mean/standard deviation ratio of the stationary distribution. It concludes that a pure policy optimum exists.