Article ID: | iaor20082664 |
Country: | United States |
Volume: | 55 |
Issue: | 4 |
Start Page Number: | 769 |
End Page Number: | 781 |
Publication Date: | Jul 2007 |
Journal: | Operations Research |
Authors: | Glazebrook K.D., Gaver D.P., Jacobs P.A., Mitchell H.M., Kirkbride C. |
Keywords: | search, game theory, programming: dynamic, markov processes, control processes |
We consider a scenario in which a single Red wishes to shoot at a collection of Blue targets, one at a time, to maximise some measure of return obtained from Blues killed before Red’s own (possible) demise. Such a situation arises in various military contexts, such as the conduct of air defence by Red in the face of Blue SEAD (suppression of enemy air defences). A class of decision processes called multiarmed bandits has been previously deployed to develop optimal policies for Red, in which she attaches a calibrating (Gittins) index to each Blue target and optimally shoots next at the Blue with the largest index value. The current paper seeks to elucidate how a range of developments of index theory are able to accommodate features of such problems, which are of practical military import. Such features include levels of risk to Red that are policy dependent, Red having imperfect information about the Blues she faces, an evolving population of Blue targets, and the possibility of Red disengagement. The paper concludes with a numerical study that both compares the performance of (optimal) index policies to a range of competitors and also demonstrates the value to Red of (optimal) disengagement.