A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal, and receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities. In this paper, we review contemporary studies in the emerging field of VLN, covering tasks, evaluation metrics, methods, etc. Through structured analysis of current progress and challenges, we highlight the limitations of current VLN and opportunities for future work. This paper serves as a thorough reference for the VLN research community.
翻译:AI研究的一个长期目标是培养能够以自然语言与人类交流、感知环境和执行现实世界任务的知识分子。 愿景和语言导航(VLN)是实现这一目标的一个基本和跨学科研究课题,并日益受到自然语言处理、计算机视觉、机器人和机器学习界的注意。本文审查VLN新兴领域的当代研究,包括任务、评价指标、方法等。通过对当前进展和挑战进行结构化分析,我们强调当前VLN的局限性和未来工作机会。本文是VLN研究界的详尽参考。