One-armed bandit models with continuous and delayed processes

One-armed bandit models with continuous and delayed processes

0.00 Avg rating0 Votes
Article ID: iaor20051046
Country: Germany
Volume: 58
Issue: 2
Start Page Number: 209
End Page Number: 219
Publication Date: Jan 2003
Journal: Mathematical Methods of Operations Research (Heidelberg)
Authors: ,
Keywords: bandit problems
Abstract:

One-armed bandit processes with continuous delayed responses are formulated as controlled stochastic processes following the Bayesian approach. It is shown that under some regularity conditions, a Gittins-like index exists which is the limit of a monotonic sequence of break-even values characterizing optimal initial selections of arms for finite horizon bandit processes. Furthermore, there is an optimal stopping solution when all observations on the unknown arm are complete. Results are illustrated with a bandit model having exponentially distributed responses, in which case the controlled stochastic process becomes a Markov decision process, the Gittins-like index is the Gittins index and the Gittins index strategy is optimal.

Reviews

Required fields are marked *. Your email address will not be published.