Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation. Through extensive experiments, we show that this framework can dramatically improve the performance in ObjectNav through learning from 3D scene representation. Our framework achieves the best performance among all modular-based methods on the Matterport3D and Gibson datasets, while requiring (up to 30x) less computational cost for training.
翻译:在未知环境中进行目标导航是房间布局学习中的基本任务。现有研究中的智能体基于二维地图、场景图或图像序列学习目标导航策略。考虑到此任务发生在三维空间中,基于三维场景表示的3D感知智能体可以通过从精细的空间信息中学习提高其目标导航能力。然而,利用3D场景表示的学习策略在此楼层级的任务中可能会面临低样本效率和昂贵的计算成本等挑战。在本文中,我们提出了一种基于两个简单子策略的框架,用于复杂的3D感知目标导航。这两个子策略,即基于角落的探索策略和基于类别的识别策略,同时利用在线融合的三维点作为观察结果。通过大量的实验,我们展示了这个框架可以通过从3D场景表示中学习,显著提高目标导航的性能。我们的框架在Matterport3D和Gibson数据集中的表现是所有基于模块化的方法中最优的,而对于训练,则需要(高达30倍)更少的计算成本。