A basic formula for performance gradient estimation of semi‐Markov decision processes

A basic formula for performance gradient estimation of semi‐Markov decision processes

0.00 Avg rating0 Votes
Article ID: iaor20126267
Volume: 224
Issue: 2
Start Page Number: 333
End Page Number: 339
Publication Date: Jan 2013
Journal: European Journal of Operational Research
Authors: ,
Keywords: programming: markov decision, game theory, simulation: analysis
Abstract:

This paper presents a basic formula for performance gradient estimation of semi‐Markov decision processes (SMDPs) under average‐reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample‐path‐based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete‐time Markov systems to continuous time semi‐Markov models. In particular, they require less storage than the algorithm in the literature.

Reviews

Required fields are marked *. Your email address will not be published.