A policy gradient method for semi-Markov decision processes with application to call admission control

0.00 Avg rating—0 Votes

Article ID:	iaor2009616
Country:	Netherlands
Volume:	178
Issue:	3
Start Page Number:	808
End Page Number:	818
Publication Date:	May 2007
Journal:	European Journal of Operational Research
Authors:	Singh Sumeetpal S., Tadi Vladislav B., Doucet Arnaud
Keywords:	stochastic processes

Abstract:

Solving a semi-Markov decision process (SMDP) using value or policy iteration requires precise knowledge of the probabilistic model and suffers from the curse of dimensionality. To overcome these limitations, we present a reinforcement learning approach where one optimizes the SMDP performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it using stochastic approximation.

Reviews

Required fields are marked *. Your email address will not be published.