Article ID: | iaor2004689 |
Country: | United States |
Volume: | 22 |
Issue: | 1 |
Start Page Number: | 222 |
End Page Number: | 255 |
Publication Date: | Feb 1997 |
Journal: | Mathematics of Operations Research |
Authors: | Burnetas A.N., Katehakis M.N. |
In this paper we consider the problem of adaptive control for Markov Decision Processes. We give the explicit form for a class of adaptive policies that possess optimal increase rate properties for the the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law. A main feature of the proposed policies is that the choice of actions, at each state and time period, is based on indices that are inflations of the right-hand side of the estimated average reward optimality equations.