Policy iteration for robust nonstationary Markov decision processes

Policy iteration for robust nonstationary Markov decision processes

0.00 Avg rating0 Votes
Article ID: iaor20163749
Volume: 10
Issue: 8
Start Page Number: 1613
End Page Number: 1628
Publication Date: Dec 2016
Journal: Optimization Letters
Authors: ,
Keywords: optimization, programming: markov decision, heuristics
Abstract:

Policy iteration is a well‐studied algorithm for solving stationary Markov decision processes (MDPs). It has also been extended to robust stationary MDPs. For robust nonstationary MDPs, however, an ‘as is’ execution of this algorithm is not possible because it would call for an infinite amount of computation in each iteration. We therefore present a policy iteration algorithm for robust nonstationary MDPs, which performs finitely implementable approximate variants of policy evaluation and policy improvement in each iteration. We prove that the sequence of cost‐to‐go functions produced by this algorithm monotonically converges pointwise to the optimal cost‐to‐go function; the policies generated converge subsequentially to an optimal policy.

Reviews

Required fields are marked *. Your email address will not be published.