| Article ID: | iaor19911749 | 
| Country: | United States | 
| Volume: | 16 | 
| Issue: | 1 | 
| Start Page Number: | 195 | 
| End Page Number: | 207 | 
| Publication Date: | Feb 1991 | 
| Journal: | Mathematics of Operations Research | 
| Authors: | Ross Keith W., Varadarajan Ravi | 
The authors consider finite-state finite-action Markov decision processes which accumulate both a reward and a cost at each decision epoch. They study the problem of finding a policy that maximizes the expected long-run average reward subject to the constraint that the long-run average cost be no greater than a given value with probability one. The authors establish that if there exists a policy that meets the constraint, then there exists an