Article ID: | iaor2003710 |
Country: | Germany |
Volume: | 54 |
Issue: | 3 |
Start Page Number: | 387 |
End Page Number: | 393 |
Publication Date: | Jan 2001 |
Journal: | Mathematical Methods of Operations Research (Heidelberg) |
Authors: | Tind J., Brock M. |
Keywords: | programming: dynamic |
We study the situation where there are a number of on-going production processes each yielding a state-dependent standard reward in discrete time. At each time step one may select at most one of these processes for improvement; the selected process will yield a state-dependent non-standard reward (or cost) at that time step and change its state according to a Markov chain. We show that this model can be cast into a bandit formulation with constructed rewards and we characterize the optimal policy. Finally, we present a numerical example.