| Article ID: | iaor1988748 |
| Country: | United States |
| Volume: | 14 |
| Issue: | 1 |
| Start Page Number: | 147 |
| End Page Number: | 161 |
| Publication Date: | Feb 1989 |
| Journal: | Mathematics of Operations Research |
| Authors: | Filar Jerzy A., Kallenberg C.M., Lee Huey-Miin |
The authors consider a Markov decision process with both the expected limiting average, and the discounted total return criteria, appropriately modified to include a penalty for the variability in the stream of rewards. In both cases they formulate appropriate nonlinear programs in the space of state-action frequencies (averaged, or discounted) whose optimal solutions are shown to be related to the optimal policies in the corresponding ‘variance-penalized MDP’. The analysis of one of the discounted cases is facilitated by the introduction of a ‘Cartesian product of two independent MDPs’.