Article ID: | iaor20164230 |
Volume: | 41 |
Issue: | 4 |
Start Page Number: | 1222 |
End Page Number: | 1247 |
Publication Date: | Nov 2016 |
Journal: | Mathematics of Operations Research |
Authors: | Adelman Daniel, Mancini Angelo J |
Keywords: | programming: markov decision, optimization, heuristics |
Quasi‐open‐loop policies consist of sequences of Markovian decision rules that are insensitive to one component of the state space. Given a semi‐Markov decision process (SMDP), we distinguish between exogenous and endogenous state components as follows: (i) the decision‐maker’s actions do not impact the evolution of an exogenous state component, and (ii) between consecutive decision epochs, the exogenous and endogenous state components are conditionally independent given the decision‐maker’s latest action. For simplicity, we consider an SMDP with one exogenous and one endogenous state component. When transition times between epochs are conditionally independent of the exogenous state given the most recent action, and the exogenous component is a multiplicative compound Poisson process, we provide an almost‐everywhere condition on the reward function sufficient for the optimality of a quasi‐open‐loop policy. After adjusting the discount factor to account for the statistical properties of the exogenous state process, obtaining this policy amounts to solving a reduced SMDP in which the exogenous state is static. Depending on the relationship between the structure of the exogenous state process and the shape of the reward function, we can replace the almost‐everywhere condition with one that applies only in expectation. Quasi‐open‐loop optimality holds even if the times between decision epochs depend on the Poisson process underlying the exogenous state component, and/or the Poisson process is replaced with a generic counting process.