Markov decision processes with noise-corrupted and delayed state observations

0.00 Avg rating—0 Votes

Article ID:	iaor20002333
Country:	United Kingdom
Volume:	50
Issue:	6
Start Page Number:	660
End Page Number:	668
Publication Date:	Jun 1999
Journal:	Journal of the Operational Research Society
Authors:	White C.C., Bander J.L.
Keywords:	programming: dynamic

Abstract:

We consider the partially observed Markov decision process with observations delayed by k time periods. We show that at stage t, a sufficient statistic is the probability distribution of the underlying system state at stage t – k and all actions taken from stage t – k through stage t – 1. We show that improved observation quality and/or reduced data delay will not decrease the optimal expected total discounted reward, and we explore the optimality conditions for three important special cases. We present a measure of the marginal value of receiving state observations delayed by (k – 1) stages rather than delayed by k stages. We show that in the limit as k→∞ the problem is equivalent to the completely unobserved case. We present numerical examples which illustrate the value of receiving state information delayed by k stages.

Reviews

Required fields are marked *. Your email address will not be published.