A policy gradient method for semi-Markov decision processes with application to call admission control

A policy gradient method for semi-Markov decision processes with application to call admission control

0.00 Avg rating0 Votes
Article ID: iaor2009616
Country: Netherlands
Volume: 178
Issue: 3
Start Page Number: 808
End Page Number: 818
Publication Date: May 2007
Journal: European Journal of Operational Research
Authors: , ,
Keywords: stochastic processes
Abstract:

Solving a semi-Markov decision process (SMDP) using value or policy iteration requires precise knowledge of the probabilistic model and suffers from the curse of dimensionality. To overcome these limitations, we present a reinforcement learning approach where one optimizes the SMDP performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it using stochastic approximation.

Reviews

Required fields are marked *. Your email address will not be published.