Recent studies report that many machine reading comprehension (MRC) models can perform closely to or even better than humans on benchmark datasets. However, existing works indicate that many MRC models may learn shortcuts to outwit these benchmarks, but the performance is unsatisfactory in real-world applications. In this work, we attempt to explore, instead of the expected comprehension skills, why these models learn the shortcuts. Based on the observation that a large portion of questions in current datasets have shortcut solutions, we argue that larger proportion of shortcut questions in training data make models rely on shortcut tricks excessively. To investigate this hypothesis, we carefully design two synthetic datasets with annotations that indicate whether a question can be answered using shortcut solutions. We further propose two new methods to quantitatively analyze the learning difficulty regarding shortcut and challenging questions, and revealing the inherent learning mechanism behind the different performance between the two kinds of questions. A thorough empirical analysis shows that MRC models tend to learn shortcut questions earlier than challenging questions, and the high proportions of shortcut questions in training sets hinder models from exploring the sophisticated reasoning skills in the later stage of training.
翻译:最近的研究表明,在基准数据集上,许多机器阅读理解模型可以与人类密切地或甚至比人类更出色地表现在基准数据集上。然而,现有工作表明,许多模型可以学习捷径,以超越基准,但实际应用中的性能并不令人满意。在这项工作中,我们试图探索,为什么这些模型可以不使用预期的理解技能而学习捷径。根据当前数据集中很大一部分问题都有捷径解决办法这一观察,我们认为,培训数据中的较大部分捷径问题使模型过度依赖捷径技巧。为了调查这一假设,我们仔细设计了两个合成数据集,其中说明是否可以用捷径解决方案回答一个问题。我们进一步提出了两种新方法,从数量上分析关于捷径和质疑性问题的学习困难,并揭示两种问题之间不同性能背后的内在学习机制。一项透彻的实验分析表明,MRC模型往往在捷径问题上先于挑战性问题,而培训组中常见的捷径问题妨碍了在后期阶段探索精密的推理技巧。