Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information

0.00 Avg rating—0 Votes

Article ID:	iaor19971642
Country:	United States
Volume:	21
Issue:	3/4
Start Page Number:	267
End Page Number:	291
Publication Date:	Dec 1995
Journal:	Queueing Systems
Authors:	Stidham Shaler, Altman Eitan

Abstract:

The authors consider a discrete-time Markov decision process with a partially ordered state space and two feasible control actions in each state. The present goal is to find general conditions, which are satisfied in a broad class of applications to control of queues, under which an optimal control policy is monotonic. An advantage of the approach is that it easily extends to problems with both information and action delays, which are common in applications to high-speed communication networks, among others. The transition probabilities are stochastically monotone and the one-stage reward submodular. The authors further assume that transitions from different states are coupled, in the sense that the state after a transition is distributed as a deterministic function of the current state and two random variables, one of which is controllable and the other uncontrollable. Finally, they make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, the authors demonstrate that optimal policies for the finite- and infinite-horizon discounted problems are monotonic. They apply these results to a single queueing facility with control of arrivals and/or services, under very general conditions. In this case, the present results imply that an optimal control policy has threshold form. Finally the authors show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action delay, including delays of more than one time unit. Specifically, they show that, if a problem without delay satisfies the present sufficient conditions for monotonicity of an optimal policy, then the same problem with information and/or action delay also has monotone (e.g., threshold) optimal policies.

Reviews

Required fields are marked *. Your email address will not be published.