Zero-sum games for discrete-time multi-armed bandit processes with a generalized discount

Zero-sum games for discrete-time multi-armed bandit processes with a generalized discount

0.00 Avg rating0 Votes
Article ID: iaor19931131
Country: India
Volume: 13
Issue: 2
Start Page Number: 231
End Page Number: 255
Publication Date: May 1992
Journal: Journal of Information & Optimization Sciences
Authors:
Keywords: programming: dynamic
Abstract:

The present paper deals with zero-sum games for multi-armed bandid processes and solves them as control problems of multi-parameter Markov processes. This paper extends the results of Lawler-Vanderbei to zero-sum games with discounts which depend on transition of states. The aim of this paper is to give unique optimal values and the optimal Markov strategies which are constructively provided by Bellman’s equation derived from a value iteration.

Reviews

Required fields are marked *. Your email address will not be published.