We investigate the task of object goal navigation in unknown environments where the target is specified by a semantic label (e.g. find a couch). Such a navigation task is especially challenging as it requires understanding of semantic context in diverse settings. Most of the prior work tackles this problem under the assumption of a discrete action policy whereas we present an approach with continuous control which brings it closer to real world applications. We propose a deep neural network architecture and loss function to predict dense cost maps that implicitly contain semantic context and guide the robot towards the semantic goal. We also present a novel way of fusing mid-level visual representations in our architecture to provide additional semantic cues for cost map prediction. The estimated cost maps are then used by a sampling-based model predictive controller (MPC) for generating continuous robot actions. The preliminary experiments suggest that the cost maps generated by our network are suitable for the MPC and can guide the agent to the semantic goal more efficiently than a baseline approach. The results also indicate the importance of mid-level representations for navigation by improving the success rate by 7 percentage points.
翻译:我们调查了目标在未知环境中的物体目标导航任务,该目标由语义标签指定(例如,找一张沙发),这种导航任务特别具有挑战性,因为它需要理解不同环境中的语义背景。以前的大部分工作在假定离散行动政策的情况下解决这个问题,而我们则提出一种连续控制的方法,使其更接近现实世界应用。我们提议了一个深神经网络架构和损失功能,以预测含有语义背景的密集成本图,并引导机器人走向语义目标。我们还提出了一种新颖的方法,在我们的建筑中运用中层视觉图示,为成本地图预测提供更多的语义提示。估计成本图随后由一个基于抽样的模型预测控制器(MPC)用于产生连续的机器人行动。初步实验表明,我们网络产生的成本图适合MPC,并且能够比基线方法更高效地指导代理人实现语义目标。结果还表明,通过将成功率提高7个百分点,中层图像对导航的重要性。