"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation. This problem has attracted rising attention in recent years due to its wide application in autonomous driving, vacuum cleaner, and rescue robot. A navigation agent is supposed to have various intelligent skills, such as visual perceiving, mapping, planning, exploring and reasoning, etc. Building such an agent that observes, thinks, and acts is a key to real intelligence. The remarkable learning ability of deep learning methods empowered the agents to accomplish embodied visual navigation tasks. Despite this, embodied visual navigation is still in its infancy since a lot of advanced skills are required, including perceiving partially observed visual input, exploring unseen areas, memorizing and modeling seen scenarios, understanding cross-modal instructions, and adapting to a new environment, etc. Recently, embodied visual navigation has attracted rising attention of the community, and numerous works has been proposed to learn these skills. This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey. We summarize the benchmarks and metrics, review different methods, analysis the challenges, and highlight the state-of-the-art methods. Finally, we discuss unresolved challenges in the field of embodied visual navigation and give promising directions in pursuing future research.
翻译:“隐形视觉导航”问题要求代理人在三维环境中航行,主要依赖其第一人观测。近年来,这个问题由于在自主驾驶、真空清洁器和救援机器人中广泛应用而引起越来越多的注意。导航代理人应当具备各种智能技能,例如视觉观察、绘图、规划、探索和推理等。建立这种观察、思考和行为是真实智能的关键。深层次学习方法的非凡学习能力使代理人能够完成隐含的视觉导航任务。尽管如此,体现的视觉导航仍然处于萌芽阶段,因为需要大量先进技能,包括观察部分可见的视觉投入、探索看不见的区域、记忆和建模、理解跨模式的指示和适应新的环境等。最近,成形的视觉导航引起了社区越来越多的关注,为学习这些技能提出了许多工作。本文试图通过提供全面的文献调查,为体现视觉导航任务领域的现行工作制定大纲。我们总结了基准和指标、审查不同的方法、分析挑战、对视觉情景进行模拟的模拟、最终在进行前瞻性的研究中突出了方向。