This project leverages advances in multi-agent reinforcement learning (MARL) to improve the efficiency and flexibility of order-picking systems for commercial warehouses. We envision a warehouse of the future in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput) under given resource constraints. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, the MARL framework can be flexibly applied to any warehouse configuration (e.g. size, layout, number/types of workers, item replenishment frequency) and the agents learn via a process of trial-and-error how to optimally cooperate with one another. This paper details the current status of the R&D effort initiated by Dematic and the University of Edinburgh towards a general-purpose and scalable MARL solution for the order-picking problem in realistic warehouses.
翻译:这个项目利用多试剂强化学习(MARL)的进展来提高商业仓库定购系统的效率和灵活性。我们设想了未来的仓库,让数十个移动机器人和人类拾拾者一起在仓库内收集和运送物品。我们处理的根本问题称为定购问题,是这些工人代理人如何协调仓库内的行动和行动,以便在资源限制下最大限度地提高业绩(例如订单输送量)。采用超自然方法的行业方法,要求作出大量工程努力,优化本性变型仓库配置。相比之下,MARL框架可以灵活地适用于仓库配置(例如规模、布局、工人人数/类型、项目补充频率)和代理人通过试验和压力过程学习如何以最佳方式相互合作。本文详细说明了Dematic和爱丁堡大学发起的研发努力的现状,目的是在现实仓库中为定购问题找到通用和可扩展的MAL解决方案。