Achieving human-level performance on some of Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, it is necessary to provide both answer prediction and its explanation to further improve the MRC system's reliability, especially for real-life applications. In this paper, we propose a new benchmark called ExpMRC for evaluating the explainability of the MRC systems. ExpMRC contains four subsets, including SQuAD, CMRC 2018, RACE$^+$, and C$^3$ with additional annotations of the answer's evidence. The MRC systems are required to give not only the correct answer but also its explanation. We use state-of-the-art pre-trained language models to build baseline systems and adopt various unsupervised approaches to extract evidence without a human-annotated training set. The experimental results show that these models are still far from human performance, suggesting that the ExpMRC is challenging. Resources will be available through https://github.com/ymcui/expmrc
翻译:在强大的预先培训语言模型(PLM)的帮助下,实现某些机器阅读综合(MRC)数据集的人类水平绩效已不再具有挑战性,但是,有必要提供答案预测及其解释,以进一步提高MRC系统的可靠性,特别是实际应用的可靠性。在本文件中,我们提议了一个新的基准,称为ExpMRC,用于评价MRC系统的可解释性。ExmMRC包含四个子集,包括SQuAD、CMRC 2018、RAC$ $ 和C$3$,并附有对答案证据的补充说明。MRC系统不仅需要给出正确的答案,而且还需要给出解释。我们使用最先进的预先培训语言模型来建立基线系统,并采用各种不受监督的方法在没有人类附加说明的培训设备的情况下提取证据。实验结果表明,这些模型仍然远离人类的绩效,表明ExpMRC具有挑战性。资源将通过https://github.com/ymcui/cricrc提供。