Recent years have seen a rise in interest in terms of using machine learning, particularly reinforcement learning (RL), for production scheduling problems of varying degrees of complexity. The general approach is to break down the scheduling problem into a Markov Decision Process (MDP), whereupon a simulation implementing the MDP is used to train an RL agent. Since existing studies rely on (sometimes) complex simulations for which the code is unavailable, the experiments presented are hard, or, in the case of stochastic environments, impossible to reproduce accurately. Furthermore, there is a vast array of RL designs to choose from. To make RL methods widely applicable in production scheduling and work out their strength for the industry, the standardization of model descriptions - both production setup and RL design - and validation scheme are a prerequisite. Our contribution is threefold: First, we standardize the description of production setups used in RL studies based on established nomenclature. Secondly, we classify RL design choices from existing publications. Lastly, we propose recommendations for a validation scheme focusing on reproducibility and sufficient benchmarking.
翻译:近些年来,人们对于使用机器学习,特别是强化学习(RL)的兴趣日益浓厚,因为生产日程安排问题复杂程度不同,一般的做法是将时间安排问题分为马尔科夫决策程序(MDP),然后将模拟实施MDP用于培训RL代理。由于现有的研究依靠(有时)复杂的模拟,而没有代码,因此提出的实验是困难的,或者,如果是随机环境,则无法准确复制。此外,还有大量的RL设计可以选择。要使RL方法广泛适用于生产日程安排和为该行业发展力量,将模型描述的标准化(生产设置和RL设计)和验证计划作为先决条件。我们的贡献有三个方面:第一,我们根据既定的术语,将RL研究中使用的生产设置说明标准化。第二,我们对现有出版物中的RL设计选择进行分类。最后,我们提出了侧重于再生和充分基准的验证计划的建议。