Algorithms for aggregated limiting average Markov decision problems

Algorithms for aggregated limiting average Markov decision problems

0.00 Avg rating0 Votes
Article ID: iaor2003704
Country: Germany
Volume: 53
Issue: 3
Start Page Number: 451
End Page Number: 463
Publication Date: Jan 2001
Journal: Mathematical Methods of Operations Research (Heidelberg)
Authors: ,
Abstract:

We consider discrete time Markov Decision Process (MDP) with finite state and action spaces under average reward optimality criterion. The decomposition theory, in Ross and Varadarajan, leads to a natural partition of the state space into strongly communicating classes and a set of states that are transient under all stationary strategies. Then, an optimal pure strategy can be obtained from an optimal strategy for some smaller aggregated MDP. This decomposition gives an efficient method for solving large-scale MDPs. In this paper, we consider deterministic MDPs and we construct a simple algorithm, based on graph theory, to determine an aggregated optimal policy. In the case of MDPs without cycles, we propose an algorithm for computing aggregated optimal strategies. In the general case, we propose some new improving algorithms for computing aggregated optimal strategies.

Reviews

Required fields are marked *. Your email address will not be published.