| Article ID: | iaor19911749 |
| Country: | United States |
| Volume: | 16 |
| Issue: | 1 |
| Start Page Number: | 195 |
| End Page Number: | 207 |
| Publication Date: | Feb 1991 |
| Journal: | Mathematics of Operations Research |
| Authors: | Ross Keith W., Varadarajan Ravi |
The authors consider finite-state finite-action Markov decision processes which accumulate both a reward and a cost at each decision epoch. They study the problem of finding a policy that maximizes the expected long-run average reward subject to the constraint that the long-run average cost be no greater than a given value with probability one. The authors establish that if there exists a policy that meets the constraint, then there exists an