Article ID: | iaor19901115 |
Country: | Japan |
Volume: | 25 |
Issue: | 8 |
Start Page Number: | 860 |
End Page Number: | 866 |
Publication Date: | Aug 1989 |
Journal: | Transactions of the Society of Instrument and Control Engineers |
Authors: | Sato Mitsuo, Takeda Hiroshi, Iwasaki Tomomi |
Keywords: | learning, statistics: general |
A Markovian decision process with estimation of unknown transition probabilities is used as an important learning control model of stochastic systems in a wide range of applications. Many studies have been devoted and various schemes have been presented for the Markovian decision problem. Most of them are based on the assumption that the process is stationary, in other words, the transition probabilities are constant irrespective of time. In practice, however, they may not be generally constant. Accordingly, it is practically significant to consider the problem in the case of nonstationary processes. Particularly, the authors are interested in the case of cyclic processes. This is because many real systems may be affected by external conditions generated from cyclic natural phenomena and/or habitual human behavior. In view of these, the authors present a scheme of estimation and control for the problem on the assumption that the unknown probabilities are dominated by a parameter which changes its value with cycle