Article ID: | iaor19911749 |
Country: | United States |
Volume: | 16 |
Issue: | 1 |
Start Page Number: | 195 |
End Page Number: | 207 |
Publication Date: | Feb 1991 |
Journal: | Mathematics of Operations Research |
Authors: | Ross Keith W., Varadarajan Ravi |
The authors consider finite-state finite-action Markov decision processes which accumulate both a reward and a cost at each decision epoch. They study the problem of finding a policy that maximizes the expected long-run average reward subject to the constraint that the long-run average cost be no greater than a given value with probability one. The authors establish that if there exists a policy that meets the constraint, then there exists an