Conservation laws, extended polymatroids and multiarmed bandit problems: A polyhedral approach to indexable systems

0.00 Avg rating—0 Votes

Article ID:	iaor20041018
Country:	United States
Volume:	21
Issue:	2
Start Page Number:	257
End Page Number:	306
Publication Date:	May 1996
Journal:	Mathematics of Operations Research
Authors:	Bertsimas D., Nino-Mora J.
Keywords:	game theory

Abstract:

We show that if performance measures in stochastic and dynamic scheduling problems satisfy generalized conservation laws, then the feasible region of achievable performance is a polyhedron called an extended polymatroid, that generalizes the classical polymatroids introduced by Edmonds. Optimization of a linear objective over an extended polymatroid is solved by an adaptive greedy algorithm, which leads to an optimal solution having an indexability property (indexable systems). Under a certain condition the indices possess a stronger decomposition property (decomposable systems). The following problems can be analyzed using our theory: multiarmed bandit problems, branching bandits, scheduling of multiclass queues (with or without feedback), scheduling of a batch of jobs. Consequences of our results include: (1) a characterization of indexable systems as systems that satisfy generalized conservation laws, (2) a sufficient condition for indexable systems to be decomposable, (3) a new linear programming proof of the decomposability property of Gittins indices in multiarmed bandit problems, (4) an approach to sensitivity analysis of indexable systems, (5) a characterization of the indices of indexable systems as sums of dual variables, and an economic interpretation of the branching bandit indices in terms of retirement options, (6) an analysis of the indexability of undiscounted branching bandits, (7) a new algorithm to compute the indices of indexable systems (in particular Gittins indices), as fast as the fastest known algorithm, (8) a unification of Klimov's algorithm for multiclass queues and Gittins' algorithm for multiarmed bandits as special cases of the same algorithm, (9) a closed formula for the maximum reward of the multiarmed bandit problem, with a new proof of its submodularity and (10) an understanding of the invariance of the indices with respect to some parameters of the problem. Our approach provides a polyhedral treatment of several classical problems in stochastic and dynamic scheduling and is able to address variations such as: discounted versus undiscounted cost criterion, rewards versus taxes, discrete versus continuous time, and linear versus nonlinear objective functions.

Reviews

Required fields are marked *. Your email address will not be published.