Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning

0.00 Avg rating—0 Votes

Article ID:	iaor201111435
Volume:	39
Issue:	7
Start Page Number:	1315
End Page Number:	1324
Publication Date:	Jul 2012
Journal:	Computers and Operations Research
Authors:	Zheng Li, Li Na, Wang Weiping, Zhang Zhicong, Zhong Shouyan, Hu Kaishun
Keywords:	programming: dynamic, combinatorial optimization, heuristics

Abstract:

We address an unrelated parallel machine scheduling problem with R‐learning, an average‐reward reinforcement learning (RL) method. Different types of jobs dynamically arrive in independent Poisson processes. Thus the arrival time and the due date of each job are stochastic. We convert the scheduling problems into RL problems by constructing elaborate state features, actions, and the reward function. The state features and actions are defined fully utilizing prior domain knowledge. Minimizing the reward per decision time step is equivalent to minimizing the schedule objective, i.e. mean weighted tardiness. We apply an on‐line R‐learning algorithm with function approximation to solve the RL problems. Computational experiments demonstrate that R‐learning learns an optimal or near‐optimal policy in a dynamic environment from experience and outperforms four effective heuristic priority rules (i.e. WSPT, WMDD, ATC and WCOVERT) in all test problems.

Reviews

Required fields are marked *. Your email address will not be published.