Although safety stock optimisation has been studied for more than 60 years, most companies still use simplistic means to calculate necessary safety stock levels, partly due to the mismatch between existing analytical methods' emphases on deriving provably optimal solutions and companies' preferences to sacrifice optimal results in favour of more realistic problem settings. A newly emerging method from the field of Artificial Intelligence (AI), namely Reinforcement Learning (RL), offers promise in finding optimal solutions while accommodating more realistic problem features. Unlike analytical-based models, RL treats the problem as a black-box simulation environment mitigating against the problem of oversimplifying reality. As such, assumptions on stock keeping policy can be relaxed and a higher number of problem variables can be accommodated. While RL has been popular in other domains, its applications in safety stock optimisation remain scarce. In this paper, we investigate three RL methods, namely, Q-Learning, Temporal Difference Advantage Actor-Critic and Multi-agent Temporal Difference Advantage Actor-Critic for optimising safety stock in a linear chain of independent agents. We find that RL can simultaneously optimise both safety stock level and order quantity parameters of an inventory policy, unlike classical safety stock optimisation models where only safety stock level is optimised while order quantity is predetermined based on simple rules. This allows RL to model more complex supply chain procurement behaviour. However, RL takes longer time to arrive at solutions, necessitating future research on identifying and improving trade-offs between the use of AI and mathematical models are needed.
翻译:虽然安全库存优化研究已超过60年,但大多数公司仍然使用简单化的方法计算必要的安全库存水平,部分原因是现有分析方法的强调在得出可实现的最佳解决方案和公司偏好牺牲最佳结果以有利于更现实的问题设置方面不匹配。人工智能领域新出现的方法,即强化学习(RL),提供了寻找最佳解决方案的希望,同时适应更现实的问题特点。与基于分析的模式不同,风险实验室将这一问题视为一个黑箱的货币链模拟环境,以缓解过度简化现实的问题。因此,可以放松关于库存保持政策的假设,并能够容纳更多的问题变量。虽然风险实验室在其他领域很受欢迎,但其在安全库存优化方面的应用仍然很少。在本文件中,我们调查三种风险实验室方法,即Q-Learning、Temoor davation Advantage-Temploial Indation Advantication Adent-Critical 模拟环境环境,以优化独立代理商的线性安全库存安全链问题。因此,我们发现,对于安全性交易的假设标准只能同时选择安全排序规则,而对于简化的库存的排序则只能选择。