Article ID: | iaor1989802 |
Country: | United Kingdom |
Volume: | 20 |
Issue: | 2 |
Start Page Number: | 447 |
End Page Number: | 472 |
Publication Date: | Jun 1988 |
Journal: | Advances in Applied Probability |
Authors: | Lai Tze Leung, Ying Zhiliang |
Asymptotic approximations are developed herein for the optimal policies in discounted multi-armed bandit problems in which new projects are continually appearing, commonly known as ‘open bandit problems’ or ‘arm-acquiring bandits’. It is shown that under certain stability assumptions the open bandit problem is asymptotically equivalent to a closed bandit problem in which there is no arrival of new projects, as the discount factor approaches 1. Applications of these results to optimal scheduling of queueing networks are given. In particular, Klimov’s priority indices for scheduling queueing networks are shown to be limits of the Gittins indices for the associated closed bandit problem, and extensions of Klimov’s results to preemptive policies and to unstable queueing systems are given.