Article ID: | iaor20021379 |
Country: | United Kingdom |
Volume: | 8 |
Issue: | 3 |
Start Page Number: | 305 |
End Page Number: | 315 |
Publication Date: | May 2001 |
Journal: | International Transactions in Operational Research |
Authors: | Ohuchi Azuma, Suzuki Keiji, Yamamoto Masahito |
Keywords: | recreation & tourism |
Since Deep Blue, which is a chess program, beat the world human chess champion, recent interest in computer games has been directed to shogi. However, the search space for shogi is larger than that of chess and a captured piece is available again in shogi. To overcome these difficulties, we propose a reinforcement learning method by self-play, in order to obtain a static evaluation function, which is a map from any positions in shogi to real values. Our proposed method is based on temporal difference learning, developed by R. Sutton and applied to backgammon by G. Tesauro. In our method, the neural network, which takes the board description of shogi positions and outputs the winning percentage from the position, is trained by only self-play without any knowledge of shogi. In order to show the effectiveness of obtained evaluation function, some computational experiments will be presented.