Mean-variance tradeoffs in an undiscounted MDP

0.00 Avg rating—0 Votes

Article ID:	iaor1995318
Country:	United States
Volume:	42
Issue:	1
Start Page Number:	175
End Page Number:	183
Publication Date:	Jan 1994
Journal:	Operations Research
Authors:	Sobel Matthew J.
Keywords:	programming: multiple criteria

Abstract:

A startionary policy and an initial state in an MDP (Markov decision process) induce a stationary probability distribution of the reward. The problem analyzed here is generating the Pareto optima in the sense of high mean and low variance of the stationary distribution. In the unichain case, Pareto optima can be computed either with policy improvement or with a linear program having the same number of variables and one more constraint than the formulation for gain-rate optimization. The same linear program suffices in the multichain case if the ergodic class is an element of choice.

Reviews

Required fields are marked *. Your email address will not be published.