Article ID: | iaor2003704 |
Country: | Germany |
Volume: | 53 |
Issue: | 3 |
Start Page Number: | 451 |
End Page Number: | 463 |
Publication Date: | Jan 2001 |
Journal: | Mathematical Methods of Operations Research (Heidelberg) |
Authors: | Abbad M., Daoui C. |
We consider discrete time Markov Decision Process (MDP) with finite state and action spaces under average reward optimality criterion. The decomposition theory, in Ross and Varadarajan, leads to a natural partition of the state space into strongly communicating classes and a set of states that are transient under all stationary strategies. Then, an optimal pure strategy can be obtained from an optimal strategy for some smaller aggregated MDP. This decomposition gives an efficient method for solving large-scale MDPs. In this paper, we consider deterministic MDPs and we construct a simple algorithm, based on graph theory, to determine an aggregated optimal policy. In the case of MDPs without cycles, we propose an algorithm for computing aggregated optimal strategies. In the general case, we propose some new improving algorithms for computing aggregated optimal strategies.