Machine reading comprehension (MRC) is a crucial task in natural language processing and has achieved remarkable advancements. However, most of the neural MRC models are still far from robust and fail to generalize well in real-world applications. In order to comprehensively verify the robustness and generalization of MRC models, we introduce a real-world Chinese dataset -- DuReader_robust. It is designed to evaluate the MRC models from three aspects: over-sensitivity, over-stability and generalization. Comparing to previous work, the instances in DuReader_robust are natural texts, rather than the altered unnatural texts. It presents the challenges when applying MRC models to real-world applications. The experimental results show that MRC models do not perform well on the challenge test set. Moreover, we analyze the behavior of existing models on the challenge test set, which may provide suggestions for future model development. The dataset and codes are publicly available at https://github.com/baidu/DuReader.
翻译:机器阅读理解(MRC)是自然语言处理中的一项关键任务,并取得了显著的进步。然而,大多数神经MRC模型仍然远远不够强大,无法在现实世界应用中全面推广。为了全面核实MRC模型的坚固性和普遍性,我们引入了一个真实世界的中国数据集 -- -- DuReader_robust。它旨在从三个方面对MRC模型进行评估:过度敏感、过度稳定和概括化。与以往的工作相比,DuReader_robust的事例是自然文本,而不是被改变的非自然文本。它展示了在将MRC模型应用于现实世界应用时所面临的挑战。实验结果显示,MRC模型在挑战测试集上效果不佳。此外,我们分析了挑战测试集上的现有模型的行为,这些模型可以为未来的模型开发提供建议。数据集和代码可在https://github.com/baidu/DuReader上公开查阅。