We present a novel reinforcement learning (RL) based task allocation and decentralized navigation algorithm for mobile robots in warehouse environments. Our approach is designed for scenarios in which multiple robots are used to perform various pick up and delivery tasks. We consider the problem of joint decentralized task allocation and navigation and present a two level approach to solve it. At the higher level, we solve the task allocation by formulating it in terms of Markov Decision Processes and choosing the appropriate rewards to minimize the Total Travel Delay (TTD). At the lower level, we use a decentralized navigation scheme based on ORCA that enables each robot to perform these tasks in an independent manner, and avoid collisions with other robots and dynamic obstacles. We combine these lower and upper levels by defining rewards for the higher level as the feedback from the lower level navigation algorithm. We perform extensive evaluation in complex warehouse layouts with large number of agents and highlight the benefits over state-of-the-art algorithms based on myopic pickup distance minimization and regret-based task selection. We observe improvement up to 14% in terms of task completion time and up-to 40% improvement in terms of computing collision-free trajectories for the robots.
翻译:我们为仓库环境中的移动机器人提供了一种新的强化学习(RL)任务分配和分散导航算法。我们的方法是针对多种机器人被用于执行各种接送和交付任务的情况设计的。我们考虑了联合分散分配任务和导航的问题,并提出了解决该问题的两级办法。在较高层次上,我们用Markov 决策程序来制定任务分配,并选择适当的奖励来尽量减少总旅行延迟。在较低层次上,我们使用基于ORCA的分散导航办法,使每个机器人能够独立地执行这些任务,避免与其他机器人和动态障碍发生碰撞。我们将这些高低层次和高层次结合起来,方法是确定较高层次的奖励,作为较低层次导航算法的反馈。我们用大量物剂对复杂的仓库布局进行广泛的评价,并突出基于近似的拉皮距离最小化和遗憾任务选择的先进算法的好处。我们观察到在任务完成时间方面改进到14%,在计算无碰撞轨道方面改进到40%。