Safe and efficient co-planning of multiple robots in pedestrian participation environments is promising for applications. In this work, a novel multi-robot social-aware efficient cooperative planner that on the basis of off-policy multi-agent reinforcement learning (MARL) under partial dimension-varying observation and imperfect perception conditions is proposed. We adopt temporal-spatial graph (TSG)-based social encoder to better extract the importance of social relation between each robot and the pedestrians in its field of view (FOV). Also, we introduce K-step lookahead reward setting in multi-robot RL framework to avoid aggressive, intrusive, short-sighted, and unnatural motion decisions generated by robots. Moreover, we improve the traditional centralized critic network with multi-head global attention module to better aggregates local observation information among different robots to guide the process of individual policy update. Finally, multi-group experimental results verify the effectiveness of the proposed cooperative motion planner.
翻译:对行人参与环境中的多个机器人进行安全有效的共同规划,对于应用来说很有希望。在这项工作中,一个新的多机器人社会觉悟高效合作规划师(MARL)在片面分布式观测和不完善的认知条件下,提出了非政策性多剂强化学习(MARL)的建议。我们采用了基于时间空间图的社会编码器(TSG),以更好地提取每个机器人与行人之间在其视野领域的社会关系的重要性。此外,我们还在多机器人RL框架内引入了Kstep长相奖赏设置,以避免机器人产生的攻击性、侵扰性、短视和非自然动作决定。此外,我们改进了传统的集中评论器网络,采用多头全球关注模块,更好地汇总不同机器人之间的当地观测信息,以指导个人政策更新进程。最后,多组实验结果验证了拟议合作动作规划师的有效性。