Variance-penalized Markov decision processes

Variance-penalized Markov decision processes

0.00 Avg rating0 Votes
Article ID: iaor1988748
Country: United States
Volume: 14
Issue: 1
Start Page Number: 147
End Page Number: 161
Publication Date: Feb 1989
Journal: Mathematics of Operations Research
Authors: , ,
Abstract:

The authors consider a Markov decision process with both the expected limiting average, and the discounted total return criteria, appropriately modified to include a penalty for the variability in the stream of rewards. In both cases they formulate appropriate nonlinear programs in the space of state-action frequencies (averaged, or discounted) whose optimal solutions are shown to be related to the optimal policies in the corresponding ‘variance-penalized MDP’. The analysis of one of the discounted cases is facilitated by the introduction of a ‘Cartesian product of two independent MDPs’.

Reviews

Required fields are marked *. Your email address will not be published.