Article ID: | iaor1988748 |
Country: | United States |
Volume: | 14 |
Issue: | 1 |
Start Page Number: | 147 |
End Page Number: | 161 |
Publication Date: | Feb 1989 |
Journal: | Mathematics of Operations Research |
Authors: | Filar Jerzy A., Kallenberg C.M., Lee Huey-Miin |
The authors consider a Markov decision process with both the expected limiting average, and the discounted total return criteria, appropriately modified to include a penalty for the variability in the stream of rewards. In both cases they formulate appropriate nonlinear programs in the space of state-action frequencies (averaged, or discounted) whose optimal solutions are shown to be related to the optimal policies in the corresponding ‘variance-penalized MDP’. The analysis of one of the discounted cases is facilitated by the introduction of a ‘Cartesian product of two independent MDPs’.