Article ID: | iaor1993746 |
Country: | United States |
Volume: | 17 |
Issue: | 3 |
Start Page Number: | 558 |
End Page Number: | 571 |
Publication Date: | Aug 1992 |
Journal: | Mathematics of Operations Research |
Authors: | Baykal-Gursoy Melike, Ross K.W. |
Considered are time-average Markov Decision Processes (MDPs) with finite state and action spaces. Two definitions of variability are introduced, namely, the expected time-average variability and time-average expected variability. The two criteria are in general different, although they can both be employed to penalize for variance in the stream of rewards. For communicating MDPs, the authors construct a (randomized) stationary policy that is