Article ID: | iaor1988216 |
Country: | Germany |
Volume: | 10 |
Start Page Number: | 161 |
End Page Number: | 166 |
Publication Date: | Oct 1988 |
Journal: | OR Spektrum |
Authors: | Hbner G. |
Keywords: | control processes |
The classical procedure for the adaptive control of average reward Markov decision processes with an unknown parameter chooses at each stage a decison which is optimal for the average reward problem with the presently estimated parameter. But in many cases it is inefficient or impossible to compute each time the long run optimal policy. So successive approximation methods were proposed and investigated. The paper presents a unifying and generalizing approach including both types of methods mentioned above and generating a lot of new procedures, too.