Article ID: | iaor20051046 |
Country: | Germany |
Volume: | 58 |
Issue: | 2 |
Start Page Number: | 209 |
End Page Number: | 219 |
Publication Date: | Jan 2003 |
Journal: | Mathematical Methods of Operations Research (Heidelberg) |
Authors: | Wang X., Bickis M.G. |
Keywords: | bandit problems |
One-armed bandit processes with continuous delayed responses are formulated as controlled stochastic processes following the Bayesian approach. It is shown that under some regularity conditions, a Gittins-like index exists which is the limit of a monotonic sequence of break-even values characterizing optimal initial selections of arms for finite horizon bandit processes. Furthermore, there is an optimal stopping solution when all observations on the unknown arm are complete. Results are illustrated with a bandit model having exponentially distributed responses, in which case the controlled stochastic process becomes a Markov decision process, the Gittins-like index is the Gittins index and the Gittins index strategy is optimal.