In this paper, we present a new verification style reading comprehension dataset named VGaokao from Chinese Language tests of Gaokao. Different from existing efforts, the new dataset is originally designed for native speakers' evaluation, thus requiring more advanced language understanding skills. To address the challenges in VGaokao, we propose a novel Extract-Integrate-Compete approach, which iteratively selects complementary evidence with a novel query updating mechanism and adaptively distills supportive evidence, followed by a pairwise competition to push models to learn the subtle difference among similar text pieces. Experiments show that our methods outperform various baselines on VGaokao with retrieved complementary evidence, while having the merits of efficiency and explainability. Our dataset and code are released for further research.
翻译:在本文中,我们展示了一个新的核查风格阅读理解数据集,名为VGaokao,来自高高地的中文测试。与现有的努力不同,新的数据集最初设计用于本地语者评估,因此需要更先进的语言理解技能。为了应对VGaokao的挑战,我们提出了一个新的“抽取-Integrate-Compete”方法,该方法反复选择补充证据,采用新的查询更新机制,适应性地提取支持性证据,然后进行配对式竞争,推动模型了解类似文本的细微差异。实验显示,我们的方法在利用检索到的补充证据的同时,超过了VGaokao的各种基线,同时具有效率和解释的优点。我们的数据集和代码被发布供进一步研究。