Nonparametric estimation and adaptive control in a class of finite Markov decision chains

0.00 Avg rating—0 Votes

Article ID:	iaor19911684
Country:	Switzerland
Volume:	28
Start Page Number:	169
End Page Number:	184
Publication Date:	Apr 1991
Journal:	Annals of Operations Research
Authors:	Cavazos-Cadena R.

Abstract:

The authors consider a class of Markov decision processes with finite state and action spaces which, essentially, is determined by the following condition: The state space is irreducible under the action of any stationary policy. However, except by this restriction, the transition law is completely unknown to the controller. In this context, the authors find a set of policies under which the frequency estimators of the transition law are strongly consistent and then, this result is applied to construct adaptive asymptotically discount-optimal policies.

Reviews

Required fields are marked *. Your email address will not be published.