一种演员-批评算法,与深层的双重经常性代理商一起,解决工作商店的时间安排问题 (An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem)

There is a growing interest in integrating machine learning techniques and optimization to solve challenging optimization problems. In this work, we propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP). The aim is to build up a greedy-like heuristic able to learn on some distribution of JSSP instances, different in the number of jobs and machines. The need for fast scheduling methods is well known, and it arises in many areas, from transportation to healthcare. We model the JSSP as a Markov Decision Process and then we exploit the efficacy of reinforcement learning to solve the problem. We adopt an actor-critic scheme, where the action taken by the agent is influenced by policy considerations on the state-value function. The procedures are adapted to take into account the challenging nature of JSSP, where the state and the action space change not only for every instance but also after each decision. To tackle the variability in the number of jobs and operations in the input, we modeled the agent using two incident LSTM models, a special type of deep neural network. Experiments show the algorithm reaches good solutions in a short time, proving that is possible to generate new greedy heuristics just from learning-based methodologies. Benchmarks have been generated in comparison with the commercial solver CPLEX. As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.

翻译：在这项工作中,我们提出一个强化学习方法,使该代理人采取的行动受到关于国家价值功能的政策考虑的影响。目的是建立一种贪婪的累赘主义,能够了解JSP案例的某些分布,这在工作数量和机器方面是不同的。对快速时间安排方法的需要是众所周知的,它产生于从运输到保健等许多领域。我们把JSP模拟成一个Markov 决策程序,然后我们利用强化学习的功效来解决这个问题。我们采用了一个演员-critic 计划,使该代理人采取的行动受到关于国家价值功能的政策考虑的影响。这些程序经过调整,以考虑到JSP具有挑战性的性质,即状态和行动空间不仅在每一个情况下都发生变化,而且在每个决定之后也是如此。为了解决投入中工作和业务数量的变异性,我们用两种事件LSTM模型来模拟该代理人,一种特殊的神经网络。实验显示算法在很短的时间内就找到了良好的解决方案,证明从新的贪婪的模型可以产生新的Curistimical 比较,从学习到一种预期的Cristimical 方法,从一种不同的学习过程可以产生一种不同的分析。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日