Convex analytic approach to constrained discounted Markov decision processes with non‐constant discount factors

0.00 Avg rating—0 Votes

Article ID:	iaor20134305
Volume:	21
Issue:	2
Start Page Number:	378
End Page Number:	408
Publication Date:	Jul 2013
Journal:	TOP
Authors:	Zhang Yi
Keywords:	programming: convex

Abstract:

In this paper we develop the convex analytic approach to a discounted discrete‐time Markov decision process (DTMDP) in Borel state and action spaces with N constraints. Unlike the classic discounted models, we allow a non‐constant discount factor. After defining and characterizing the corresponding occupation measures, the original constrained DTMDP is written as a convex program in the space of occupation measures, whose compactness and convexity we show. In particular, we prove that every extreme point of the space of occupation measures can be generated by a deterministic stationary policy for the DTMDP. For the resulting convex program, we prove that it admits a solution that can be expressed as a convex combination of N+1 extreme points of the space of occupation measures. One of its consequences is the existence of a randomized stationary optimal policy for the original constrained DTMDP.

Reviews

Required fields are marked *. Your email address will not be published.