强力视觉语言导航反向强化指示攻击器 (Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation)

Language instruction plays an essential role in the natural language grounded navigation tasks. However, navigators trained with limited human-annotated instructions may have difficulties in accurately capturing key information from the complicated instruction at different timesteps, leading to poor navigation performance. In this paper, we exploit to train a more robust navigator which is capable of dynamically extracting crucial factors from the long instruction, by using an adversarial attacking paradigm. Specifically, we propose a Dynamic Reinforced Instruction Attacker (DR-Attacker), which learns to mislead the navigator to move to the wrong target by destroying the most instructive information in instructions at different timesteps. By formulating the perturbation generation as a Markov Decision Process, DR-Attacker is optimized by the reinforcement learning algorithm to generate perturbed instructions sequentially during the navigation, according to a learnable attack score. Then, the perturbed instructions, which serve as hard samples, are used for improving the robustness of the navigator with an effective adversarial training strategy and an auxiliary self-supervised reasoning task. Experimental results on both Vision-and-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks show the superiority of our proposed method over state-of-the-art methods. Moreover, the visualization analysis shows the effectiveness of the proposed DR-Attacker, which can successfully attack crucial information in the instructions at different timesteps. Code is available at https://github.com/expectorlin/DR-Attacker.

翻译：语言教学在自然语言的导航任务中发挥着不可或缺的作用。然而,受过有限的人类附加说明说明指导培训的航海者可能难以准确地从不同时间步骤的复杂教学中获取关键信息,导致导航性能差。在本文中,我们利用一个对抗性攻击模式,训练一个更强大的导航员,能够动态地从长期教学中提取关键因素。具体地说,我们提议了一个动态强化指令攻击者(DR-Attacker),它学会误导导航员向错误的目标移动,在不同时间步骤销毁指示中最有启发性的信息。通过将扰动生成作为马尔科夫决定程序,DR-Attacker被优化了强化学习算法,以便在可学习的攻击分中按顺序生成扰动指令。然后,作为硬样本的深处指令被用来改进导航器的稳健性,通过有效的对抗性培训策略和辅助性自我监督性推理任务。通过强化演算法,DR-A-L-A-L-A-A-OVAL-A-DR-DR-S-DR-S-SUrviewal-S-L-Aviewalviewalviewalislation-L-L-L-L-L-S-S-L-L-Lislview-S-S-S-S-Lisaview-S-S-S-S-L-L-S-S-L-SUview-S-S-L-S-S-S-S-S-L-L-L-L-SUdalviolvivivivivig-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-A-S-S-S-S-S-S-S-S-S-S-S-S-S-