Article ID: | iaor19931131 |
Country: | India |
Volume: | 13 |
Issue: | 2 |
Start Page Number: | 231 |
End Page Number: | 255 |
Publication Date: | May 1992 |
Journal: | Journal of Information & Optimization Sciences |
Authors: | Yoshida Yuji |
Keywords: | programming: dynamic |
The present paper deals with zero-sum games for multi-armed bandid processes and solves them as control problems of multi-parameter Markov processes. This paper extends the results of Lawler-Vanderbei to zero-sum games with discounts which depend on transition of states. The aim of this paper is to give unique optimal values and the optimal Markov strategies which are constructively provided by Bellman’s equation derived from a value iteration.