An improved algorithm for solving communicating average reward Markov decision processes

0.00 Avg rating—0 Votes

Article ID:	iaor19911686
Country:	Switzerland
Volume:	28
Start Page Number:	229
End Page Number:	242
Publication Date:	Apr 1991
Journal:	Annals of Operations Research
Authors:	Haviv M., Puterman M.L.

Abstract:

This paper provides a policy iteration algorithm for solving communicating Markov decision processes (MDPs) with average reward criterion. The algorithm is based on the result that for communicating MDPs there is an optimal policy which is unichain. The improvement step is modified to select only unichain policies; consequently the nested optimality equations of Howard’s multichain policy iteration algorithm are avoided. Properties and advantages of the algorithm are discussed and it is incorporated into a decomposition algorithm for solving multichain MDPs. Since it is easier to show that a problem is communicating than unichain the authors recommend use of this algorithm instead of unichain policy iteration.

Reviews

Required fields are marked *. Your email address will not be published.