The COVID-19 pandemic has highlighted the importance of supply chains and the role of digital management to react to dynamic changes in the environment. In this work, we focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain. Traditional inventory optimization methods aim to determine a static reordering policy. Thus, these policies are not able to adjust to dynamic changes such as those observed during the COVID-19 crisis. On the other hand, conventional strategies offer the advantage of being interpretable, which is a crucial feature for supply chain managers in order to communicate decisions to their stakeholders. To address this limitation, we propose an interpretable reinforcement learning approach that aims to be as interpretable as the traditional static policies while being as flexible and environment-agnostic as other deep learning-based reinforcement learning solutions. We propose to use Neural Additive Models as an interpretable dynamic policy of a reinforcement learning agent, showing that this approach is competitive with a standard full connected policy. Finally, we use the interpretability property to gain insights into a complex ordering strategy for a simple, linear three-echelon inventory supply chain.
翻译:COVID-19 大流行突显了供应链的重要性和数字化管理对于应对环境动态变化的作用。本文关注于为多阶段供应链(即多个阶段)开发动态库存订购策略。传统库存优化方法旨在确定一种静态的重新订购策略。因此,这些策略无法调整到诸如 COVID-19 危机期间观察到的动态变化。另一方面,传统策略具有可解释性的优点,这是供应链经理交流决策给利益相关者的重要特征。为了解决这一限制,我们提出了一种可解释的强化学习方法,旨在与传统静态策略一样具有可解释性,同时与其他深度强化学习解决方案一样具有灵活性和对环境不可知性。我们建议使用神经加法模型作为强化学习代理的可解释动态策略,并展示了这种方法在与标准全连接策略相竞争中是有竞争力的。最后,我们利用可解释性属性,以线性三阶段库存供应链的复杂订购策略为例,获得深刻见解。