Constrained discounted dynamic programming

Constrained discounted dynamic programming

0.00 Avg rating0 Votes
Article ID: iaor20041193
Country: United States
Volume: 21
Issue: 4
Start Page Number: 922
End Page Number: 945
Publication Date: Nov 1996
Journal: Mathematics of Operations Research
Authors:
Abstract:

This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semicontinuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a feasible policy exists for a problem with M constraints. We prove two results on the existence and structure of optimal policies. First, we show that there exists a randomized stationary optimal policy which requires at most M actions more than a nonrandomized stationary one. This result is known for several particular cases. Second, we prove that there exists an optimal policy which is (i) stationary (nonrandomized) from some step onward, (ii) randomized Markov before this step, but the total number of actions which are added by randomization is at most M. (iii) the total number of actions that are added by nonstationarity is at most M. We also establish Pareto optimality of policies from the two classes described above for multi-criteria problems. We describe an algorithm to compute optimal policies with properties (i)–(iii) for constrained problems. The policies that satisfy properties (i)–(iii) have the pleasing aesthetic property that the amount of randomization they require over any trajectory is restricted by the number of constraints. In contrast, a randomized stationary policy may require an infinite number of randomizations over time.

Reviews

Required fields are marked *. Your email address will not be published.