Multi-agent path finding in dynamic crowded environments is of great academic and practical value for multi-robot systems in the real world. To improve the effectiveness and efficiency of communication and learning process during path planning in dynamic crowded environments, we introduce an algorithm called Attention and BicNet based Multi-agent path planning with effective reinforcement (AB-Mapper)under the actor-critic reinforcement learning framework. In this framework, on the one hand, we utilize the BicNet with communication function in the actor-network to achieve intra team coordination. On the other hand, we propose a centralized critic network that can selectively allocate attention weights to surrounding agents. This attention mechanism allows an individual agent to automatically learn a better evaluation of actions by also considering the behaviours of its surrounding agents. Compared with the state-of-the-art method Mapper,our AB-Mapper is more effective (85.86% vs. 81.56% in terms of success rate) in solving the general path finding problems with dynamic obstacles. In addition, in crowded scenarios, our method outperforms the Mapper method by a large margin,reaching a stunning gap of more than 40% for each experiment.
翻译:在活跃的拥挤环境中发现多试剂路径对于现实世界的多机器人系统具有巨大的学术和实际价值。为了在动态的拥挤环境中改进路径规划过程中通信和学习过程的效能和效率,我们引入了一种算法,即“注意”和基于BicNet的多试剂路径规划,并在演员-批评强化学习框架下进行有效强化(AB-Mapper)。在这个框架内,一方面,我们利用在行为者-网络中具有通信功能的BicNet实现团队内部协调。另一方面,我们建议建立一个集中的批评网络,可以有选择地将注意力分给周围的代理人。这一注意机制允许个体代理人通过同时考虑其周围代理人的行为,自动学习对行动的更好评价。与最先进的方法Mapper相比,我们的AB-Mapper在解决总路径中发现动态障碍的问题方面更有效(82.86%比81.56%)。此外,在拥挤的情景中,我们的方法比地图绘制者的方法高出很大空间,每个实验的惊人差距超过40%。