We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.
翻译:我们解决了一个实际而具有挑战性的问题,即培训机器人代理人在某个语言指示所描述的路径所描述的环境中航行。指令通常包含对环境对象的描述。为了实现准确和高效的导航,至关重要的是要建立一个地图,准确代表环境对象的空间位置和环境物体的语义信息。然而,使机器人能够绘制一个能够很好地代表环境的地图是一个极具挑战性的问题,因为环境往往涉及具有各种属性的不同对象。在本文件中,我们建议绘制一个多色度地图,其中既包含精细的物体细节(例如,颜色、纹理),也包含语义类,以更全面地代表物体。此外,我们提出一个薄弱的、超强的辅助任务,要求该代理人在地图上将与指示相关的物体本地化。通过这项任务,该代理人不仅学会将与导航相关的对象本地化,而且还鼓励学习一个能够显示对象信息的更好的地图表达方式。我们然后将所学过的地图和指示输入到一个路径点的预测器,以便更全面地显示下一个导航目标。实验结果显示我们的方法超越了可获取的州-州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/州/