Vision-and-language Navigation (VLN) task requires an embodied agent to navigate to a remote location following a natural language instruction. Previous methods usually adopt a sequence model (e.g., Transformer and LSTM) as the navigator. In such a paradigm, the sequence model predicts action at each step through a maintained navigation state, which is generally represented as a one-dimensional vector. However, the crucial navigation clues (i.e., object-level environment layout) for embodied navigation task is discarded since the maintained vector is essentially unstructured. In this paper, we propose a novel Structured state-Evolution (SEvol) model to effectively maintain the environment layout clues for VLN. Specifically, we utilise the graph-based feature to represent the navigation state instead of the vector-based state. Accordingly, we devise a Reinforced Layout clues Miner (RLM) to mine and detect the most crucial layout graph for long-term navigation via a customised reinforcement learning strategy. Moreover, the Structured Evolving Module (SEM) is proposed to maintain the structured graph-based state during navigation, where the state is gradually evolved to learn the object-level spatial-temporal relationship. The experiments on the R2R and R4R datasets show that the proposed SEvol model improves VLN models' performance by large margins, e.g., +3% absolute SPL accuracy for NvEM and +8% for EnvDrop on the R2R test set.
翻译:视觉和语言导航( VLN) 任务需要一个包含代理器在自然语言教学后导航到远程位置。 先前的方法通常会采用一个序列模型( 如变换器和 LSTM) 作为导航员。 在这样的模式中, 序列模型预测每个步骤会通过一个维持的导航状态采取行动, 该状态通常代表为一维矢量。 然而, 包含导航任务的关键导航线索( 即对象级环境布局) 会被丢弃, 因为所维护的矢量基本上没有结构化。 在本文中, 我们提出一个新的结构化状态演变( Sevol) 模型( Sevol) 以有效维护 VLN 的环境布局提示。 具体地说, 我们使用基于图形的特性来代表导航状态, 而不是基于矢量的状态。 因此, 我们设计了一个强化的布局提示( Miner), 用于通过定制的强化强化强化强化的强化的强化的导航学习策略来探测长期导航的最关键的布局图。 此外, 结构化模块( SEM) 提议在导航过程中维持基于结构的egrodudud- e- des e- R- R roulation ( Sliver) prilver2) 和 Rbliver2, 其中, 将状态用于测试大型的运行中, 该状态用于空间级变换成成成成为大型的S- sdeal- sal- sal- sal- sal- sal- sal- sal- sal- sald- sald- sald- sald- salviald- sal- sal- sal- sald- sald- sald- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sald- sal- sal- sal- sal- sal- sal- sal- salismismismal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- s