Non-discounted optimal policies in controlled Markov set-chains

Non-discounted optimal policies in controlled Markov set-chains

0.00 Avg rating0 Votes
Article ID: iaor20003042
Country: Japan
Volume: 42
Issue: 3
Start Page Number: 256
End Page Number: 267
Publication Date: Sep 1999
Journal: Journal of the Operations Research Society of Japan
Authors: ,
Keywords: markov processes, optimization, programming: dynamic
Abstract:

This paper develops interval techniques for studying non-homogeneous Markov decision process with uncertain transition matrices. The uncertain transition matrix Q is described using the interval, Q ≤ Q ≤ &Qmacr;, whose lower and upper bounds Q and &Qmacr; are supposed to be determined from data and experience by the decision maker. In this approximation model, called a controlled Markov set-chain, the set of total expected rewards discounted by &bgr; (0 < &bgr; < 1) is shown to be a closed interval and its behavior is analysed as &bgr; approaches 1 under some regularity condition. Also, a non-discounted optimal policy is found, which maximizes the Abel-sum of rewards in time over all stationary policies under some partial order. Some computational works are included. As a numerical example, a Markov set-chain version of the Toymaker's problem is solved.

Reviews

Required fields are marked *. Your email address will not be published.