Pretrained language models have achieved super-human performances on many Machine Reading Comprehension (MRC) benchmarks. Nevertheless, their relative inability to defend against adversarial attacks has spurred skepticism about their natural language understanding. In this paper, we ask whether training with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC models against adversarial attacks. To explore that question, we fine-tune three state-of-the-art language models on either SQuAD 1.1 or SQuAD 2.0 and then evaluate their robustness under adversarial attacks. Our experiments reveal that current models fine-tuned on SQuAD 2.0 do not initially appear to be any more robust than ones fine-tuned on SQuAD 1.1, yet they reveal a measure of hidden robustness that can be leveraged to realize actual performance gains. Furthermore, we find that the robustness of models fine-tuned on SQuAD 2.0 extends to additional out-of-domain datasets. Finally, we introduce a new adversarial attack to reveal artifacts of SQuAD 2.0 that current MRC models are learning.
翻译:受过训练的语言模式在许多机器阅读理解(MRC)基准中取得了超人性的表现,然而,相对而言,这些模式相对无力抵御对抗性攻击,却引发了对其自然语言理解的怀疑。在本文中,我们询问用SQAD 2.0中无法回答的问题进行培训是否有助于提高MRC模式对对抗性攻击的稳健性。为了探讨这一问题,我们微调了SQAD 1.1 或 SQAD 2. 0 上三种最先进的语言模式,然后在对抗性攻击中评估其强性。我们的实验显示,目前对SQAD 2. 0 进行微调的模型最初似乎并不比对SQAD 1. 1 进行微调的模型更强,但它们揭示了某种隐藏的稳健性,可以用来实现实际绩效收益。此外,我们发现,对SQAD 2. 0 进行微调的模型的稳健性扩大到额外的外置数据。最后,我们引入了一种新的对抗性攻击,以揭示SQAD 2. 0 0. 目前的MRC 模型正在学习的SQAD 2. 0 的文物。