One of the emerging research trends in natural language understanding is machine reading comprehension (MRC) which is the task to find answers to human questions based on textual data. Existing Vietnamese datasets for MRC research concentrate solely on answerable questions. However, in reality, questions can be unanswerable for which the correct answer is not stated in the given textual data. To address the weakness, we provide the research community with a benchmark dataset named UIT-ViQuAD 2.0 for evaluating the MRC task and question answering systems for the Vietnamese language. We use UIT-ViQuAD 2.0 as a benchmark dataset for the challenge on Vietnamese MRC at the Eighth Workshop on Vietnamese Language and Speech Processing (VLSP 2021). This task attracted 77 participant teams from 34 universities and other organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 77.24% in F1-score and 67.43% in Exact Match on the private test set. The Vietnamese MRC systems proposed by the top 3 teams use XLM-RoBERTa, a powerful pre-trained language model based on the transformer architecture. The UIT-ViQuAD 2.0 dataset motivates researchers to further explore the Vietnamese machine reading comprehension task and related tasks such as question answering, question generation, and natural language inference.
翻译:自然语言理解方面新出现的研究趋势之一是机读理解(MRC),这是根据文本数据寻找人类问题答案的任务。目前越南用于MRC研究的越南数据集仅集中于可回答的问题。然而,在现实中,问题可能无法回答,而给定的文本数据中并未对此作出正确答复。为解决这一弱点,我们向研究界提供了一个名为UIT-ViQuAD 2.0的基准数据集,用于评价MRC的任务和越南语的问答系统。我们使用UIT-ViQuAD 2.0作为越南语和语音处理问题第八次讲习班(VLSP 2021)上越南语和语音处理研究的现有越南语和研究中心挑战的基准数据集。这项任务吸引了来自34所大学和其他组织的77个参与者团队。在本篇文章中,我们介绍了挑战的组织细节、共同任务参与者使用的方法概览,以及用于评价越南语的回答系统最高性能为77.24%,在Exact测试集中,67.43%。由前三组提议的越南MRC系统使用XLM-VIAR-AFS 和甚高的阅读结构,这是甚高的、甚甚甚高压的、甚高压的难的原始任务。