This paper introduces a Reinforcement Learning approach to better generalize heuristic dispatching rules on the Job-shop Scheduling Problem (JSP). Current models on the JSP do not focus on generalization, although, as we show in this work, this is key to learning better heuristics on the problem. A well-known technique to improve generalization is to learn on increasingly complex instances using Curriculum Learning (CL). However, as many works in the literature indicate, this technique might suffer from catastrophic forgetting when transferring the learned skills between different problem sizes. To address this issue, we introduce a novel Adversarial Curriculum Learning (ACL) strategy, which dynamically adjusts the difficulty level during the learning process to revisit the worst-performing instances. This work also presents a deep learning model to solve the JSP, which is equivariant w.r.t. the job definition and size-agnostic. Conducted experiments on Taillard's and Demirkol's instances show that the presented approach significantly improves the current state-of-the-art models on the JSP. It reduces the average optimality gap from 19.35\% to 10.46\% on Taillard's instances and from 38.43\% to 18.85\% on Demirkol's instances. Our implementation is available online.
翻译:本文介绍了强化学习方法,以更好地概括有关就业商店日程安排问题(JSP)的超速发送规则。目前JSP的模型并不注重一般化,尽管正如我们在这项工作中所表明的那样,这是学习有关该问题的更精华学的关键。一个众所周知的改进普及技术是学习使用课程学习(CL)的日益复杂的案例。然而,正如许多文献著作表明的,在转移不同问题大小之间所学技能时,这一技术可能因灾难性的忘记而受损。为了解决这一问题,我们引入了一个新型的Aversarial课程学习(ACL)战略,在学习过程中对难度水平进行动态调整,以重新审视最差的事例。这项工作还提出了解决JSP的深层次学习模式,即工作定义和规模-不可知性。在TAillard和Demirkol的例子中进行的实验表明,所提出的方法大大改进了JSP43目前的状况模式。它将学习过程中的难度从19.35年到18.46年的在线执行中的平均最佳差距从18.85年到18.46年的在线情况。