Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding. However, the conventional task design of MRC lacks explainability beyond the model interpretation, i.e., reading comprehension by a model cannot be explained in human terms. To this end, this position paper provides a theoretical basis for the design of MRC datasets based on psychology as well as psychometrics, and summarizes it in terms of the prerequisites for benchmarking MRC. We conclude that future datasets should (i) evaluate the capability of the model for constructing a coherent and grounded representation to understand context-dependent situations and (ii) ensure substantive validity by shortcut-proof questions and explanation as a part of the task design.
翻译:机读理解(MRC)作为自然语言理解的基准受到相当重视,然而,MRC的常规任务设计除模型解释外缺乏可解释性,即不能用人的角度解释模型的阅读理解,为此,本立场文件为基于心理学和心理学的MRC数据集设计提供了理论基础,并概括了对MRC进行基准衡量的先决条件。我们的结论是,未来的数据集应当(一) 评价构建一个连贯和有根据的表述模型,以理解根据具体情况的情况的能力,(二) 通过不设捷径的问题和解释,作为任务设计的一部分,确保该模型的实质性有效性。