We tackle the problem of cooperative visual exploration where multiple agents need to jointly explore unseen regions as fast as possible based on visual signals. Classical planning-based methods often suffer from expensive computation overhead at each step and a limited expressiveness of complex cooperation strategy. By contrast, reinforcement learning (RL) has recently become a popular paradigm for tackling this challenge due to its modeling capability of arbitrarily complex strategies and minimal inference overhead. In this paper, we extend the state-of-the-art single-agent visual navigation method, Active Neural SLAM (ANS), to the multi-agent setting by introducing a novel RL-based planning module, Multi-agent Spatial Planner (MSP).MSP leverages a transformer-based architecture, Spatial-TeamFormer, which effectively captures spatial relations and intra-agent interactions via hierarchical spatial self-attentions. In addition, we also implement a few multi-agent enhancements to process local information from each agent for an aligned spatial representation and more precise planning. Finally, we perform policy distillation to extract a meta policy to significantly improve the generalization capability of final policy. We call this overall solution, Multi-Agent Active Neural SLAM (MAANS). MAANS substantially outperforms classical planning-based baselines for the first time in a photo-realistic 3D simulator, Habitat. Code and videos can be found at https://sites.google.com/view/maans.
翻译:我们处理合作视觉探索的问题,在这种探索中,多种代理机构需要根据视觉信号尽可能快地共同探索不可见的区域; 以古老规划为基础的方法往往在每一步都有昂贵的计算间接费用和有限的复杂合作战略的清晰度; 相比之下,强化学习(RL)最近已成为应对这一挑战的流行范例,因为其具有任意复杂战略和最小推论间接费用的建模能力; 在本文件中,我们将最先进的单一试探视觉导航方法(主动神经系统SLM(ANS))推广到多试剂设置,采用新的基于RL的规划模块(多剂空间规划员(MSP)); 利用一个基于变压器的架构(空间-TeamFormer),通过分级空间自控,有效地捕捉空间关系和内试互动; 此外,我们还实施了几个多剂强化措施,处理每个代理机构提供的当地信息,以进行统一的空间代表和更精确的规划; 最后,我们进行政策蒸馏,以提取一项元政策,以大大改进最终政策的普及能力。 我们称这是一个基于全面解决方案、多剂-AMAM-Simal的基线, 。