Some results on two-armed bandits when both projects vary

0.00 Avg rating—0 Votes

Article ID:	iaor1990674
Country:	Israel
Volume:	26
Issue:	3
Start Page Number:	1
End Page Number:	7
Publication Date:	Sep 1989
Journal:	Journal of Applied Probability
Authors:	OFlaherty Brendan .

Abstract:

In the multi-armed bandit problem, the decision-maker must choose each period a single project to work on. From the chosen project she receives an immediate reward that depends on the current state of the project. Next period the chosen project makes a stochastic transition to a new state, but projects that are not chosen remain in the same state. What happens in a two-armed bandit context if projects not chosen do not remain in the same state? Two sufficient conditions are derived for the optimal policy to be myopic: either the transition function for chosen projects has in a certain sense uniformly stronger stochastic dominance than the transition function for unchosen projects, or both transition processes are normal martingales, the variance of which is independent of the history of process choices.

Reviews

Required fields are marked *. Your email address will not be published.