A stochastic approximation for parameters of the Markov decision processes

0.00 Avg rating—0 Votes

Article ID:	iaor20051045
Country:	China
Volume:	25
Issue:	5
Start Page Number:	377
End Page Number:	380
Publication Date:	Sep 2003
Journal:	Journal of Yunnan University
Authors:	Hu Guanghua

Abstract:

A stochastic gradient algorithm for average reward of the Markov decision processes that depends on a parameter vector is proposed. A new gradient for the objective function is given and a stochastic approximation algorithm that is based on a single sample path is presented. Finally, a proof of convergence of the gradient (with probability 1) is provided.

Reviews

Required fields are marked *. Your email address will not be published.