Large-scale and high-quality corpora are necessary for evaluating machine reading comprehension models on a low-resource language like Vietnamese. Besides, machine reading comprehension (MRC) for the health domain offers great potential for practical applications; however, there is still very little MRC research in this domain. This paper presents ViNewsQA as a new corpus for the Vietnamese language to evaluate healthcare reading comprehension models. The corpus comprises 22,057 human-generated question-answer pairs. Crowd-workers create the questions and their answers based on a collection of over 4,416 online Vietnamese healthcare news articles, where the answers comprise spans extracted from the corresponding articles. In particular, we develop a process of creating a corpus for the Vietnamese machine reading comprehension. Comprehensive evaluations demonstrate that our corpus requires abilities beyond simple reasoning, such as word matching and demanding difficult reasoning based on single-or-multiple-sentence information. We conduct experiments using different types of machine reading comprehension methods to achieve the first baseline performances, compared with further models' performances. We also measure human performance on the corpus and compared it with several powerful neural network-based and transfer learning-based models. Our experiments show that the best machine model is ALBERT, which achieves an exact match score of 65.26% and an F1-score of 84.89% on our corpus. The significant differences between humans and the best-performance model (14.53% of EM and 10.90% of F1-score) on the test set of our corpus indicate that improvements in ViNewsQA could be explored in the future study. Our corpus is publicly available on our website for the research purpose to encourage the research community to make these improvements.
翻译:大型和高质量的公司团体对于评估越南这样低资源语言的机器阅读理解模型是必要的。 此外,对健康领域的机器阅读理解(MRC)为实际应用提供了巨大的潜力;然而,该领域的MRC研究仍然很少。本文将ViNewsQA作为越南语用于评估健康阅读理解模型的一个新平台。本剧由22,057对人生成的问答配对组成。众工根据4,416多篇越南在线保健新闻文章的汇编提出问题和答案,答复内容包括从相应文章中提取的频谱。特别是,我们开发了一个为越南机器阅读理解建立系统的进程。全面评估表明,我们的身体需要超越简单推理的能力,例如根据单或多语种信息进行词匹配和要求困难推理。我们用不同种类的机器阅读理解方法进行实验,以达到第一个基线性能,与进一步的模型相比较。我们还测量了本体上的人类表现,并比较了几个强大的基于虚拟网络和基于学习的模型。特别是,我们开发了越南机器的网络的系统功能,让我们的将来的成绩和学习模型之间有一定的成绩的成绩对比。 我们的实验室的实验显示,最能的成绩的成绩是最佳的实验室的实验室的成绩模型的模型。