Actor-critic-type learning algorithms for Markov decision processes

Actor-critic-type learning algorithms for Markov decision processes

0.00 Avg rating0 Votes
Article ID: iaor2002910
Country: United States
Volume: 38
Issue: 1
Start Page Number: 94
End Page Number: 123
Publication Date: Dec 1999
Journal: SIAM Journal on Control and Optimization
Authors: ,
Keywords: learning
Abstract:

Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known ‘actor-critic’ (or ‘adaptive critic’) algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.

Reviews

Required fields are marked *. Your email address will not be published.