Article ID: | iaor20133931 |
Volume: | 206 |
Issue: | 1 |
Start Page Number: | 197 |
End Page Number: | 219 |
Publication Date: | Jul 2013 |
Journal: | Annals of Operations Research |
Authors: | Guo Xianping, Huang Yonghui, Wei Qingda |
Keywords: | queues: applications |
This paper deals with constrained Markov decision processes (MDPs) with first passage criteria. The objective is to maximize the expected reward obtained during a first passage time to some target set, and a constraint is imposed on the associated expected cost over this first passage time. The state space is denumerable, and the rewards/costs are possibly unbounded. In addition, the discount factor is state‐action dependent and is allowed to be equal to one. We develop suitable conditions for the existence of a constrained optimal policy, which are generalizations of those for constrained MDPs with the standard discount criteria. Moreover, it is revealed that the constrained optimal policy randomizes between two stationary policies differing in at most one state. Finally, we use a controlled queueing system to illustrate our results, which exhibits some advantage of our optimality conditions.