Actor-critic-type learning algorithms for Markov decision processes

0.00 Avg rating—0 Votes

Article ID:	iaor2002910
Country:	United States
Volume:	38
Issue:	1
Start Page Number:	94
End Page Number:	123
Publication Date:	Dec 1999
Journal:	SIAM Journal on Control and Optimization
Authors:	Borkar V.S., Konda V.R.
Keywords:	learning

Abstract:

Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known ‘actor-critic’ (or ‘adaptive critic’) algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.

Reviews

Required fields are marked *. Your email address will not be published.