Optimality of Quasi-Open-Loop Policies for Discounted Semi-Markov Decision Processes

0.00 Avg rating—0 Votes

Article ID:	iaor20164230
Volume:	41
Issue:	4
Start Page Number:	1222
End Page Number:	1247
Publication Date:	Nov 2016
Journal:	Mathematics of Operations Research
Authors:	Adelman Daniel, Mancini Angelo J
Keywords:	programming: markov decision, optimization, heuristics

Abstract:

Quasi‐open‐loop policies consist of sequences of Markovian decision rules that are insensitive to one component of the state space. Given a semi‐Markov decision process (SMDP), we distinguish between exogenous and endogenous state components as follows: (i) the decision‐maker’s actions do not impact the evolution of an exogenous state component, and (ii) between consecutive decision epochs, the exogenous and endogenous state components are conditionally independent given the decision‐maker’s latest action. For simplicity, we consider an SMDP with one exogenous and one endogenous state component. When transition times between epochs are conditionally independent of the exogenous state given the most recent action, and the exogenous component is a multiplicative compound Poisson process, we provide an almost‐everywhere condition on the reward function sufficient for the optimality of a quasi‐open‐loop policy. After adjusting the discount factor to account for the statistical properties of the exogenous state process, obtaining this policy amounts to solving a reduced SMDP in which the exogenous state is static. Depending on the relationship between the structure of the exogenous state process and the shape of the reward function, we can replace the almost‐everywhere condition with one that applies only in expectation. Quasi‐open‐loop optimality holds even if the times between decision epochs depend on the Poisson process underlying the exogenous state component, and/or the Poisson process is replaced with a generic counting process.

Reviews

Required fields are marked *. Your email address will not be published.