Average optimality in a Poissonian bandit with switching arms

0.00 Avg rating—0 Votes

Article ID:	iaor19982955
Country:	Germany
Volume:	45
Issue:	2
Start Page Number:	265
End Page Number:	280
Publication Date:	Jan 1997
Journal:	Mathematical Methods of Operations Research (Heidelberg)
Authors:	Yushkevich Alexander A., Donchev D.S.
Keywords:	control processes

Abstract:

A symmetric Poissonian two-armed bandit becomes, in terms of a posteriori probabilities, a piecewise deterministic Markov decision process. For the case of the switching arms, only one of which creates rewards, we solve explicitly the average optimality equation and prove that a myopic policy is average optimal.

Reviews

Required fields are marked *. Your email address will not be published.