A class of bandit problems yielding myopic optimal strategies

A class of bandit problems yielding myopic optimal strategies

0.00 Avg rating0 Votes
Article ID: iaor1994363
Country: Israel
Volume: 29
Issue: 3
Start Page Number: 625
End Page Number: 632
Publication Date: Sep 1992
Journal: Journal of Applied Probability
Authors: ,
Keywords: stochastic processes, game theory
Abstract:

The authors consider the class of bandit problems in which each of the n≥2 independent arms generates rewards according to one of the same two reward distributions, and discounting is geometric over an infinite horizon. They show that the dynamic allocation index of Gittins and Jones in this context is strictly increasing in the probability that an arm is the better of the two distributions. It follows as an immediate consequence that myopic strategies are the uniquely optimal strategies in this class of bandit problems, regardless of the value of the discount parameter or the shape of the reward distributions. Some implications of this result for bandits with Bernoulli reward distributions are given.

Reviews

Required fields are marked *. Your email address will not be published.