Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a method that allows researchers to discover biases, and adjust the data or the models in an earlier stage will be beneficial. Thus, we introduce MRCLens, a toolkit that detects whether biases exist before users train the full model. For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.
翻译:最近许多神经模型在机器阅读理解中显示出显著的经验性结果,但有证据表明,有时模型利用数据集偏差来预测和不概括非抽样数据。 虽然提出了许多其他方法来从计算角度解决这一问题,例如新结构或培训程序,但我们认为,一种方法可以让研究人员发现偏见,在早期调整数据或模型将是有益的。 因此,我们引入了MRCLens,这是一个工具,在用户培训完整模型之前检测是否存在偏见。为了方便引入工具包,我们还提供了MRC中常见偏见的分类。