Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions. Previous works ignore the distribution of sample difficulty and we argue that this potentially degrade their agent performance. To tackle this issue, we propose a novel curriculum-based training paradigm for VLN tasks that can balance human prior knowledge and agent learning progress about training samples. We develop the principle of curriculum design and re-arrange the benchmark Room-to-Room (R2R) dataset to make it suitable for curriculum training. Experiments show that our method is model-agnostic and can significantly improve the performance, the generalizability, and the training efficiency of current state-of-the-art navigation agents without increasing model complexity.
翻译:视觉和语言导航(VLN)是一项任务,在这个任务中,一种物剂根据人类的指示在封闭的室内环境中航行。先前的工作忽略了样本难度的分布,我们争辩说,这有可能降低其物剂性能。为了解决这个问题,我们提议为VLN的任务制定一个基于课程的新培训模式,以平衡人类先前的知识,使代理人学习有关培训样品的进展。我们制定了课程设计原则,并将基准室间对室(R2R)数据集重新排列,使之适合课程培训。 实验表明,我们的方法是模型学的,可以大大提高当前最先进的导航剂的性能、可普及性和培训效率,而不会增加模型的复杂性。