On the empirical state–action frequencies in Markov decision processes under general policies

0.00 Avg rating—0 Votes

Article ID:	iaor20061366
Country:	United States
Volume:	30
Issue:	3
Start Page Number:	545
End Page Number:	561
Publication Date:	Aug 2005
Journal:	Mathematics of Operations Research
Authors:	Tsitsiklis John N., Mannor Shie

Abstract:

We consider the empirical state–action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some policy, in a strong sense. Furthermore, we show that the probability of exceeding a given distance between the empirical frequency vector and the polytope decays exponentially with time under every policy. We provide similar results for vector-valued empirical rewards.

Reviews

Required fields are marked *. Your email address will not be published.