Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engine that can find correct answers to queries or questions in open-domain or domain-specific texts using machine reading comprehension (MRC) techniques. The majority of advancements in data resources and machine-learning approaches in the MRC and QA systems especially are developed significantly in two resource-rich languages such as English and Chinese. A low-resource language like Vietnamese has witnessed a scarcity of research on QA systems. This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source (using the UIT-ViQuAD corpus), outperforming the two robust QA systems using deep neural network models: DrQA and BERTserini with 24.46% and 6.28%, respectively. From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.
翻译:问题解答(QA)是信息检索和信息提取领域的一项自然语言理解任务,近年来,由于机器阅读理解模型的强劲发展,在计算语言和人工智能研究界引起了大量关注。基于阅读的QA系统是一个高级搜索引擎,可以使用机器阅读理解(MRC和QA系统)技术找到对开放式或特定域文本中询问或问题的正确答案。MRC和QA系统在数据资源和机器学习方法方面的多数进步,特别是以英语和中文等两种资源丰富的语言开发的。越南这种低资源语言在QA系统上研究不足。本文介绍越南第一个基于网络的变压器,即XLMRQA,使用以维基百科为基础的文本知识源(使用UIT-ViQAD文集)监督的变压器阅读器,在使用深神经网络模型(DrQA和BERTserini,分别为24.6%和6.28%)优于三种系统的结果,我们分析了三种系统的业绩影响。