Article ID: | iaor1990707 |
Country: | United States |
Volume: | 15 |
Issue: | 1 |
Start Page Number: | 1 |
End Page Number: | 7 |
Publication Date: | Feb 1990 |
Journal: | Mathematics of Operations Research |
Authors: | Shwartz Adam, Makowski A.M. |
A general framework is developed for comparing the long-run average cost of a Markov stationary policy with that of another related policy. The underlying methodology constitutes an extension of some ideas of Mandl to randomized policies, and to Polish state and action spaces. Sufficient conditions for the applicability of the methodology are given. These conditions, which are easy to verify, have a natural probabilistic interpretation in terms of the ‘stability’ of the chain and of the convergence of the control values. The usefulness of the general framework proposed here is illustrated on several applications. Standard results on the convergence of adaptive policies are readily recovered under conditions which are more transparent than the ones existing in the literature, and the convergence of randomized policies is handled as a special case. Finally, a novel application to ‘probing controls’ is outlined.