Average optimality in a Poissonian bandit with switching arms

Average optimality in a Poissonian bandit with switching arms

0.00 Avg rating0 Votes
Article ID: iaor19982955
Country: Germany
Volume: 45
Issue: 2
Start Page Number: 265
End Page Number: 280
Publication Date: Jan 1997
Journal: Mathematical Methods of Operations Research (Heidelberg)
Authors: ,
Keywords: control processes
Abstract:

A symmetric Poissonian two-armed bandit becomes, in terms of a posteriori probabilities, a piecewise deterministic Markov decision process. For the case of the switching arms, only one of which creates rewards, we solve explicitly the average optimality equation and prove that a myopic policy is average optimal.

Reviews

Required fields are marked *. Your email address will not be published.