On the empirical state–action frequencies in Markov decision processes under general policies

On the empirical state–action frequencies in Markov decision processes under general policies

0.00 Avg rating0 Votes
Article ID: iaor20061366
Country: United States
Volume: 30
Issue: 3
Start Page Number: 545
End Page Number: 561
Publication Date: Aug 2005
Journal: Mathematics of Operations Research
Authors: ,
Abstract:

We consider the empirical state–action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some policy, in a strong sense. Furthermore, we show that the probability of exceeding a given distance between the empirical frequency vector and the polytope decays exponentially with time under every policy. We provide similar results for vector-valued empirical rewards.

Reviews

Required fields are marked *. Your email address will not be published.