Achieving human-level performance on some of Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, the internal mechanism of these artifacts still remains unclear, placing an obstacle for further understanding these models. This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final performance, trying to analyze the potential explainability in PLM-based MRC models. We perform quantitative analyses on SQuAD (English) and CMRC 2018 (Chinese), two span-extraction MRC datasets, on top of BERT, ALBERT, and ELECTRA in various aspects. We discover that {\em passage-to-question} and {\em passage understanding} attentions are the most important ones, showing strong correlations to the final performance than other parts. Through visualizations and case studies, we also observe several general findings on the attention maps, which could be helpful to understand how these models solve the questions.
翻译:在强大的预先培训语言模型(PLM)的帮助下,实现某些机器阅读综合(MRC)数据集的人类水平性能已不再具有挑战性。然而,这些手工艺品的内部机制仍然不明确,这给进一步理解这些模型设置了障碍。本文件侧重于进行一系列分析实验,以审查多头自知与最后性能之间的关系,试图分析基于PLMM的MRC模型的潜在可解释性。我们就SQUAD(英文)和CMRC 2018(中文)进行了定量分析,对SQUAD(英文)和CMRC(中文)的两套跨行式Extraction MRC数据集进行了定量分析,这可能有助于理解这些模型如何解决问题。我们发现,“通向问题”和“通路理解”是最重要的一种,它们与最后性能比其它部分都具有很强的关联性。我们通过可视化和案例研究,在关注地图上也发现了一些一般性的结论,这些结论可能有助于理解这些模型如何解决问题。