The main challenge in vision-and-language navigation (VLN) is how to understand natural-language instructions in an unseen environment. The main limitation of conventional VLN algorithms is that if an action is mistaken, the agent fails to follow the instructions or explores unnecessary regions, leading the agent to an irrecoverable path. To tackle this problem, we propose Meta-Explore, a hierarchical navigation method deploying an exploitation policy to correct misled recent actions. We show that an exploitation policy, which moves the agent toward a well-chosen local goal among unvisited but observable states, outperforms a method which moves the agent to a previously visited state. We also highlight the demand for imagining regretful explorations with semantically meaningful clues. The key to our approach is understanding the object placements around the agent in spectral-domain. Specifically, we present a novel visual representation, called scene object spectrum (SOS), which performs category-wise 2D Fourier transform of detected objects. Combining exploitation policy and SOS features, the agent can correct its path by choosing a promising local goal. We evaluate our method in three VLN benchmarks: R2R, SOON, and REVERIE. Meta-Explore outperforms other baselines and shows significant generalization performance. In addition, local goal search using the proposed spectral-domain SOS features significantly improves the success rate by 17.1% and SPL by 20.6% for the SOON benchmark.
翻译:视觉和语言导航(VLN)的主要挑战是如何在看不见的环境中理解自然语言指令。常规VLN算法的主要限制是,如果行动错误,代理人没有遵循指示或探索不必要的区域,使代理人走上无法挽回的道路。为了解决这个问题,我们提议Meta-Explorore(一种等级导航方法)采用一种开发政策,以纠正最近错误的行动。我们表明,一种开发政策,将代理人推向一个在未访问但可观测的国家中选择良好的当地目标,这比将代理人推向先前访问过的状态的方法要好。我们还强调,如果行动错误,代理人没有遵循指示或探索不必要的区域,则代理人没有遵循指示或探索不必要的区域,从而导致代理人走上不可挽回的道路。为了解决这个问题,我们建议采用一种新型的视觉代表,称为场景目标频谱频谱频谱频谱频谱频谱(SOS),对检测到的物体进行2D Fourier的转换。将开发政策和SOSP特性结合起来,该代理人可以通过选择一个充满希望的当地目标的方法。我们用语系意义上有意义的线索来想象,我们用SOIER2基准将SO-RMIA(SUR)大大地标(SOIER)比其他基准(SUR)比其他基准(SOIA),我们用三个)比其他基准(SUR)大大地)比目(SUR)比目(SOB)比其他基准(SOB)比目(SUR)大大)比其他基准(SO)。我们高出)比其他基准(SUR(SOD)比目(SB)比目(SUR)。</s>