A novel framework is proposed to incrementally collect landmark-based graph memory and use the collected memory for image goal navigation. Given a target image to search, an embodied robot utilizes semantic memory to find the target in an unknown environment. % The semantic graph memory is collected from a panoramic observation of an RGB-D camera without knowing the robot's pose. In this paper, we present a topological semantic graph memory (TSGM), which consists of (1) a graph builder that takes the observed RGB-D image to construct a topological semantic graph, (2) a cross graph mixer module that takes the collected nodes to get contextual information, and (3) a memory decoder that takes the contextual memory as an input to find an action to the target. On the task of image goal navigation, TSGM significantly outperforms competitive baselines by +5.0-9.0% on the success rate and +7.0-23.5% on SPL, which means that the TSGM finds efficient paths. Additionally, we demonstrate our method on a mobile robot in real-world image goal scenarios.
翻译:提出了一个新框架, 以渐进方式收集基于地标的图形内存, 并使用所收集的内存来进行图像导航。 如果有一个目标图像要搜索, 内嵌的机器人会使用语义内存在未知环境中查找目标。% 语义图内存是从一个 RGB- D 相机的全方位观测中收集的, 而不知道机器人的外形。 在本文中, 我们展示了一个地形语义图内存( TSGM), 其中包括 (1) 个图形构建器, 将观察到的 RGB- D 图像用于构建一个地形语义图, (2) 一个交叉图形混合器模块, 将所收集的节点用于获取背景信息, (3) 一个内存解码器, 将背景内存作为查找目标动作的投入。 在图像导航任务中, TSGM 大大超出竞争性基线, 其成功率 +5. 0-9.0 % 和 SPL + 7- 23.5%, 这意味着 TSGM 能找到有效的路径。 此外, 我们在真实世界图像情景中的移动机器人上展示了我们的方法 。