We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when answer entities have different names? Such failures would indicate that models are overly reliant on entity knowledge to answer questions, and therefore may generalize poorly when facts about the world change or questions are asked about novel entities. To systematically audit model robustness, we propose a general and scalable method to replace person names with names from a variety of sources, ranging from common English names to names from other languages to arbitrary strings. Across four datasets and three pretrained model architectures, MRC models consistently perform worse when entities are renamed, with particularly large accuracy drops on datasets constructed via distant supervision. We also find large differences between models: SpanBERT, which is pretrained with span-level masking, is more robust than RoBERTa, despite having similar accuracy on unperturbed test data. Inspired by this, we experiment with span-level and entity-level masking as a continual pretraining objective and find that they can further improve the robustness of MRC models.
翻译:我们研究机器阅读理解(MRC)模型对实体重新命名的稳健性 -- -- 模型是否在答复实体有不同名称时作出更错误的预测?这种失败将表明模型过分依赖实体知识来回答问题,因此当世界变化的事实或新实体的问题被问及时,可能会不甚全面。为了系统化审计模型稳健性,我们提出了一个一般和可扩缩的方法,用各种来源的名字取代姓名,从通用英文名称到其他语言的名称到任意字符串。在四个数据集和三个预先培训的模型结构中,MRC模型在实体被重新命名时表现得一直更差,在通过遥远的监管构建的数据集上出现特别大的精确下降。 我们还发现,模型之间的差别很大:SpanBERT(事先受过跨级保护)比RoBERTA(尽管在未受干扰的测试数据上具有类似的准确性)更强。受此启发,我们用跨级和实体级掩蔽作为持续的培训前目标进行实验,发现它们能够进一步改善MRC模型的稳健性。