Article ID: | iaor20011983 |
Country: | Netherlands |
Volume: | 126 |
Issue: | 2 |
Start Page Number: | 288 |
End Page Number: | 307 |
Publication Date: | Oct 2000 |
Journal: | European Journal of Operational Research |
Authors: | Halici Ugur |
The reinforcement learning scheme proposed in Halici for the random neural network is based on reward and performs well for stationary environments. However, when the environment is not stationary it suffers from getting stuck to the previously learned action and extinction is not possible. In this paper, the reinforcement learning scheme is extended by introducing a weight update rule which takes into consideration the internal expectation of reinforcement. With the proposed scheme, the system behaves as in learning with reward when the reward for the learned action is not below the internal expectation, otherwise it behaves as in learning with punishment so that other possibilities can be explored. Such a scheme has made extinction possible while resulting in a good convergence to the most rewarding action.