Article ID: | iaor20126267 |
Volume: | 224 |
Issue: | 2 |
Start Page Number: | 333 |
End Page Number: | 339 |
Publication Date: | Jan 2013 |
Journal: | European Journal of Operational Research |
Authors: | Li Yanjie, Cao Fang |
Keywords: | programming: markov decision, game theory, simulation: analysis |
This paper presents a basic formula for performance gradient estimation of semi‐Markov decision processes (SMDPs) under average‐reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample‐path‐based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete‐time Markov systems to continuous time semi‐Markov models. In particular, they require less storage than the algorithm in the literature.