The scheduling of production resources (such as associating jobs to machines) plays a vital role for the manufacturing industry not only for saving energy but also for increasing the overall efficiency. Among the different job scheduling problems, the JSSP is addressed in this work. JSSP falls into the category of NP-hard COP, in which solving the problem through exhaustive search becomes unfeasible. Simple heuristics such as FIFO, LPT and metaheuristics such as Taboo search are often adopted to solve the problem by truncating the search space. The viability of the methods becomes inefficient for large problem sizes as it is either far from the optimum or time consuming. In recent years, the research towards using DRL to solve COP has gained interest and has shown promising results in terms of solution quality and computational efficiency. In this work, we provide an novel approach to solve the JSSP examining the objectives generalization and solution effectiveness using DRL. In particular, we employ the PPO algorithm that adopts the policy-gradient paradigm that is found to perform well in the constrained dispatching of jobs. We incorporated an OSM in the environment to achieve better generalized learning of the problem. The performance of the presented approach is analyzed in depth by using a set of available benchmark instances and comparing our results with the work of other groups.
翻译:生产资源(如将工作与机器挂钩)的时间安排对制造业至关重要,不仅节约能源,而且提高总体效率。在不同的工作时间安排问题中,联合战略规划方案在这项工作中得到了处理。联合战略规划方案属于通过彻底搜索解决问题的NP-硬COP类别,在这种类别中,通过彻底搜索无法解决这一问题。FIFO、LPT和Taboo等计量经济学等简单理论往往被采用,通过缩短搜索空间来解决问题。这些方法的可行性在大问题规模方面变得效率低下,因为它远远超出最佳或耗时的范围。近年来,关于利用DRL解决COP的研究获得了兴趣,在解决方案质量和计算效率方面显示出有希望的结果。在这项工作中,我们提供了一种新颖的方法,解决联合采购方案审查目标,利用DRL进行总体化和解决方案的有效性。特别是,我们采用PPO算法,采用在有限的工作分配过程中发现能很好地发挥作用的政策偏重率模式。我们用OSM方法在环境中采用了一种更深入的方法,以便用现有的基础学习方法来比较我们现有的其他小组的成绩。</s>