Value iteration in a class of communicating Markov decision chains with the average cost criterion

0.00 Avg rating—0 Votes

Article ID:	iaor1997663
Country:	United Kingdom
Volume:	34
Issue:	6
Start Page Number:	1848
End Page Number:	1873
Publication Date:	Nov 1996
Journal:	SIAM Journal on Control and Optimization
Authors:	Cavazos-Cadena Rolando
Keywords:	programming: dynamic

Abstract:

Markov decision processes with denumerable state space and discrete time parameter are considered. The performance index of a control policy is the (limsup expected) average cost criterion, and the main structural restrictions on the model are the following: (i) under the action of any stationary policy, the state space is a communicating class; (ii) the cost function has an almost monotone-or penalized-structure and (iii) some stationary policy induces an ergodic chain with finite average cost. In this context is is shown that the value iteration scheme can be used to construct convergent approximations of a solution to the optimality equation, as well as a sequence of stationary policies whose limit points are optimal.

Reviews

Required fields are marked *. Your email address will not be published.