We deal with the navigation problem where the agent follows natural language instructions while observing the environment. Focusing on language understanding, we show the importance of spatial semantics in grounding navigation instructions into visual perceptions. We propose a neural agent that uses the elements of spatial configurations and investigate their influence on the navigation agent's reasoning ability. Moreover, we model the sequential execution order and align visual objects with spatial configurations in the instruction. Our neural agent improves strong baselines on the seen environments and shows competitive performance on the unseen environments. Additionally, the experimental results demonstrate that explicit modeling of spatial semantic elements in the instructions can improve the grounding and spatial reasoning of the model.
翻译:我们处理的是导航问题,即代理人在观测环境时遵循自然语言指令。我们注重语言理解,我们展示空间语义在将导航指令定位为视觉感知方面的重要性。我们建议使用空间配置要素的神经剂,并调查其对导航代理推理能力的影响。此外,我们模拟顺序执行顺序,使视觉物体与指示中的空间配置相一致。我们的神经剂改进了可见环境中的强有力基线,并显示了在看不见环境中的竞争性性能。此外,实验结果显示,指示中空间语义元素的清晰建模可以改善模型的地面和空间推理。