We consider the task of visual indoor exploration with multiple agents, where the agents need to cooperatively explore the entire indoor region using as few steps as possible. Classical planning-based methods often suffer from particularly expensive computation at each inference step and a limited expressiveness of cooperation strategy. By contrast, reinforcement learning (RL) has become a trending paradigm for tackling this challenge due to its modeling capability of arbitrarily complex strategies and minimal inference overhead. We extend the state-of-the-art single-agent RL solution, Active Neural SLAM (ANS), to the multi-agent setting by introducing a novel RL-based global-goal planner, Spatial Coordination Planner (SCP), which leverages spatial information from each individual agent in an end-to-end manner and effectively guides the agents to navigate towards different spatial goals with high exploration efficiency. SCP consists of a transformer-based relation encoder to capture intra-agent interactions and a spatial action decoder to produce accurate goals. In addition, we also implement a few multi-agent enhancements to process local information from each agent for an aligned spatial representation and more precise planning. Our final solution, Multi-Agent Active Neural SLAM (MAANS), combines all these techniques and substantially outperforms 4 different planning-based methods and various RL baselines in the photo-realistic physical testbed, Habitat.
翻译:我们认为,利用多种代理物进行室内视觉探索的任务,代理物需要尽可能少的步骤合作探索整个室内区域,基于古典规划的方法往往在每一个推论步骤上都受到特别昂贵的计算,合作战略的清晰度有限;相反,强化学习(RL)由于具有任意复杂战略和最小推论管理器的建模能力,已成为应对这一挑战的典型模式;我们将最先进的单一试剂RL解决方案,即主动神经系统(ANS)扩大到多试剂设置,方法是引入一个新的基于RL的全球目标规划仪,空间协调规划仪(SCP),以端到端的方式利用每个代理物的空间信息,并有效地指导代理物走向不同的空间目标,高探索效率;SCP包括一个基于变压器的关系编码器,以捕捉内部的相互作用和空间行动解码器,以得出准确的目标;此外,我们还对每个代理物的当地信息进行了几个多试剂强化,以进行统一的空间代表制和更加精确的规划;我们的最后解决方案、多动式的RVM-L-RV-R-R-R-R-R-R-R-R-R-R-R-R-R-L-R-L-L-S-S-P-L-L-P-P-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-L-