Comparing policies in Markov decision processes: Mandl’s lemma revisited

Comparing policies in Markov decision processes: Mandl’s lemma revisited

0.00 Avg rating0 Votes
Article ID: iaor1990707
Country: United States
Volume: 15
Issue: 1
Start Page Number: 1
End Page Number: 7
Publication Date: Feb 1990
Journal: Mathematics of Operations Research
Authors: ,
Abstract:

A general framework is developed for comparing the long-run average cost of a Markov stationary policy with that of another related policy. The underlying methodology constitutes an extension of some ideas of Mandl to randomized policies, and to Polish state and action spaces. Sufficient conditions for the applicability of the methodology are given. These conditions, which are easy to verify, have a natural probabilistic interpretation in terms of the ‘stability’ of the chain and of the convergence of the control values. The usefulness of the general framework proposed here is illustrated on several applications. Standard results on the convergence of adaptive policies are readily recovered under conditions which are more transparent than the ones existing in the literature, and the convergence of randomized policies is handled as a special case. Finally, a novel application to ‘probing controls’ is outlined.

Reviews

Required fields are marked *. Your email address will not be published.