The COVID-19 pandemic has highlighted the importance of supply chains and the role of digital management to react to dynamic changes in the environment. In this work, we focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain. Traditional inventory optimization methods aim to determine a static reordering policy. Thus, these policies are not able to adjust to dynamic changes such as those observed during the COVID-19 crisis. On the other hand, conventional strategies offer the advantage of being interpretable, which is a crucial feature for supply chain managers in order to communicate decisions to their stakeholders. To address this limitation, we propose an interpretable reinforcement learning approach that aims to be as interpretable as the traditional static policies while being as flexible and environment-agnostic as other deep learning-based reinforcement learning solutions. We propose to use Neural Additive Models as an interpretable dynamic policy of a reinforcement learning agent, showing that this approach is competitive with a standard full connected policy. Finally, we use the interpretability property to gain insights into a complex ordering strategy for a simple, linear three-echelon inventory supply chain.
翻译:COVID-19疫情凸显了供应链的重要性以及数字化管理在应对环境动态变化方面的作用。本研究侧重于为多重阶级即多段式供应链开发动态库存订货策略。传统的库存优化方法旨在确定静态的订货策略,这种策略无法适应COVID-19疫情期间所观察到的动态变化。而常规策略具有可解释性的优势,这是供应链管理者向利益相关方传达决策的关键特征。针对这种局限性,我们提出了一种可解释的强化学习方法,旨在与传统静态策略一样具有解释性,同时与其他基于深度学习的强化学习解决方案一样灵活和环境无关。我们建议使用神经累加模型作为强化学习代理的可解释动态策略,并显示本方法与标准的全连接策略具有竞争力。最后,我们利用解释性属性来洞察一个简单的线性三节库存供应链的复杂订货策略。