We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed? Such failures imply that models overly rely on entity information to answer questions, and thus may generalize poorly when facts about the world change or questions are asked about novel entities. To systematically audit this issue, we present a pipeline to automatically generate test examples at scale, by replacing entity names in the original test sample with names from a variety of sources, ranging from names in the same test set, to common names in life, to arbitrary strings. Across five datasets and three pretrained model architectures, MRC models consistently perform worse when entities are renamed, with particularly large accuracy drops on datasets constructed via distant supervision. We also find large differences between models: SpanBERT, which is pretrained with span-level masking, is more robust than RoBERTa, despite having similar accuracy on unperturbed test data. We further experiment with different masking strategies as the continual pretraining objective and find that entity-based masking can improve the robustness of MRC models.
翻译:我们研究机器阅读理解(MRC)模型对实体重新命名的稳健性 -- -- 模型是否在对名称被更改的实体提出相同问题时作出更错误的预测?这种失败意味着模型过分依赖实体信息来回答问题,因此当对世界变化的事实或对新实体提出问题时,可能会不适当地概括。为了系统地审计这一问题,我们提出了一个管道,以从各种来源(从同一个测试组的名称到生活中的共同名称,到任意的字符串)的名字取代原始测试样本中的实体名称,从而自动生成测试示例。在五个数据集和三个预先培训的模型结构中,MRC模型在实体被重新命名时表现得总是更糟,在通过远程监督构建的数据集上,特别是大量精确性下降。我们还发现模型之间的巨大差异:SpanBERT,在使用跨层遮掩罩之前,比ROBERTA更加强大,尽管在未受渗透测试数据上具有类似的准确性。我们进一步试验不同的掩蔽战略,作为持续的培训前目标,并发现基于实体的遮掩罩能够改进MRC模型的稳健性。