State-of-the-art approaches to ObjectGoal navigation rely on reinforcement learning and typically require significant computational resources and time for learning. We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of `where to look?' for an object and `how to navigate to (x, y)?'. Our key insight is that `where to look?' can be treated purely as a perception problem, and learned without environment interactions. To address this, we propose a network that predicts two complementary potential functions conditioned on a semantic map and uses them to decide where to look for an unseen object. We train the potential function network using supervised learning on a passive dataset of top-down semantic maps, and integrate it into a modular framework to perform ObjectGoal navigation. Experiments on Gibson and Matterport3D demonstrate that our method achieves the state-of-the-art for ObjectGoal navigation while incurring up to 1,600x less computational cost for training. Code and pre-trained models are available: https://vision.cs.utexas.edu/projects/poni/
翻译:我们提议了一个网络,用于预测以语义地图为条件的两种补充性潜在功能,并使用这些功能决定寻找一个看不见的物体。我们利用自上而下的语义图的被动数据集监督学习,对潜在功能网络进行培训,并将之纳入一个模块框架,以进行“目标目标”导航。关于Giblic and Matterport3D的实验表明,我们的方法达到了“目标”导航的状态,同时减少了培训的计算成本1,600x。