电子商务仓储:学习储存政策 (E-commerce warehousing: learning a storage policy)

E-commerce with major online retailers is changing the way people consume. The goal of increasing delivery speed while remaining cost-effective poses significant new challenges for supply chains as they race to satisfy the growing and fast-changing demand. In this paper, we consider a warehouse with a Robotic Mobile Fulfillment System (RMFS), in which a fleet of robots stores and retrieves shelves of items and brings them to human pickers. To adapt to changing demand, uncertainty, and differentiated service (e.g., prime vs. regular), one can dynamically modify the storage allocation of a shelf. The objective is to define a dynamic storage policy to minimise the average cycle time used by the robots to fulfil requests. We propose formulating this system as a Partially Observable Markov Decision Process, and using a Deep Q-learning agent from Reinforcement Learning, to learn an efficient real-time storage policy that leverages repeated experiences and insightful forecasts using simulations. Additionally, we develop a rollout strategy to enhance our method by leveraging more information available at a given time step. Using simulations to compare our method to traditional storage rules used in the industry showed preliminary results up to 14\% better in terms of travelling times.

翻译：与大型在线零售商的电子商务正在改变人们的消费方式。提高交货速度,同时保持成本效益,对供应链提出了重大挑战,因为供应链正在竞争,以满足不断增长和迅速变化的需求。在本文件中,我们考虑建立一个装有机器人移动充电系统(RMFS)的仓库,由一组机器人储存和检索物品的架子,并将其带给人类拾取者。为了适应不断变化的需求、不确定性和有区别的服务(例如质对常规),人们可以动态地修改储存架子的分配。目标是确定动态储存政策,以尽量减少机器人为满足要求而使用的平均周期时间。我们提议将这一系统设计成一个部分可观测的马尔科夫决策程序,并使用强化学习的深Q学习机构,学习有效的实时储存政策,利用模拟来利用反复的经验和有洞察的预测。此外,我们制定了推广战略,通过在一定的时间内利用更多的现有信息来改进我们的方法。我们用模拟方法与该行业使用的传统储存规则相比较,在14个更好的旅行时间里显示了初步结果。