The Simplex and Policy‐Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex and Policy‐Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

0.00 Avg rating0 Votes
Article ID: iaor201111563
Volume: 36
Issue: 4
Start Page Number: 593
End Page Number: 603
Publication Date: Nov 2011
Journal: Mathematics of Operations Research
Authors:
Keywords: programming: markov decision
Abstract:

We prove that the classic policy‐iteration method and the original simplex method with the most‐negative‐reduced‐cost pivoting rule of Dantzig are strongly polynomial‐time algorithms for solving the Markov decision problem (MDP) with a fixed discount rate. Furthermore, the computational complexity of the policy‐iteration and simplex methods is superior to that of the only known strongly polynomial‐time interior‐point algorithm for solving this problem. The result is surprising because the simplex method with the same pivoting rule was shown to be exponential for solving a general linear programming problem, the simplex method with the smallest index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates, and the policy‐iteration method was recently shown to be exponential for solving undiscounted MDPs under the average cost criterion. We also extend the result to solving MDPs with transient substochastic transition matrices whose spectral radii are uniformly below one.

Reviews

Required fields are marked *. Your email address will not be published.