| Article ID: | iaor1989725 |
| Country: | Netherlands |
| Volume: | 8 |
| Start Page Number: | 201 |
| End Page Number: | 203 |
| Publication Date: | Apr 1989 |
| Journal: | Operations Research Letters |
| Authors: | Chung Kun-Jen |
A stationary policy in an MDP (Markov decision process) induces a stationary probability distribution of the reward from each initial state. This note is related to the problem of maximizing the mean/standard deviation ratio of the stationary distribution. It concludes that a pure policy optimum exists.