Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards

0.00 Avg rating—0 Votes

Article ID:	iaor20001035
Country:	Germany
Volume:	49
Issue:	1
Start Page Number:	1
End Page Number:	39
Publication Date:	Jan 1999
Journal:	Mathematical Methods of Operations Research (Heidelberg)
Authors:	Hordijk A., Yushkevich A.A.

Abstract:

This paper is the first part of a study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove here the existence of deterministic stationary policies which are Blackwell optimal in the class of all, in general randomized, stationary policies. We establish also a lexicographical policy improvement algorithm leading to Blackwell optimal policies and the relation between such policies and the Blackwell optimality equation. Our technique is a combination of the weighted norms approach developed in Dekker and Hordijk for countable models with unbounded rewards and of the weak–strong topology approach used in Yushkevich for Borel models with bounded rewards.

Reviews

Required fields are marked *. Your email address will not be published.