Finding the K best policies in a finite-horizon Markov decision process

Finding the K best policies in a finite-horizon Markov decision process

0.00 Avg rating0 Votes
Article ID: iaor20084091
Country: Netherlands
Volume: 175
Issue: 2
Start Page Number: 1164
End Page Number: 1179
Publication Date: Dec 2006
Journal: European Journal of Operational Research
Authors: ,
Keywords: programming: dynamic
Abstract:

Directed hypergraphs represent a general modelling and algorithmic tool, which have been successfully used in many different research areas such as artificial intelligence, database systems, fuzzy systems, propositional logic and transportation networks. However, modelling Markov decision processes using directed hypergraphs has not yet been considered. In this paper we consider finite-horizon Markov decision processes (MDPs) with finite state and action space and present an algorithm for finding the K best deterministic Markov policies. That is, we are interested in ranking the first K deterministic Markov policies in non-decreasing order using an additive criterion of optimality. The algorithm uses a directed hypergraph to model the finite-horizon MDP. It is shown that the problem of finding the optimal policy can be formulated as a minimum weight hyperpath problem and be solved in linear time, with respect to the input data representing the MDP, using different additive optimality criteria.

Reviews

Required fields are marked *. Your email address will not be published.