Article ID: | iaor19982955 |
Country: | Germany |
Volume: | 45 |
Issue: | 2 |
Start Page Number: | 265 |
End Page Number: | 280 |
Publication Date: | Jan 1997 |
Journal: | Mathematical Methods of Operations Research (Heidelberg) |
Authors: | Yushkevich Alexander A., Donchev D.S. |
Keywords: | control processes |
A symmetric Poissonian two-armed bandit becomes, in terms of a posteriori probabilities, a piecewise deterministic Markov decision process. For the case of the switching arms, only one of which creates rewards, we solve explicitly the average optimality equation and prove that a myopic policy is average optimal.