Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engine that can find correct answers to queries or questions in open-domain or domain-specific texts using machine reading comprehension (MRC) techniques. The majority of advancements in data resources and machine-learning approaches in the MRC and QA systems, on the other hand, especially in two resource-rich languages such as English and Chinese. A low-resource language like Vietnamese has witnessed a scarcity of research on QA systems. This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source (using the UIT-ViQuAD corpus), outperforming the two robust QA systems using deep neural network models: DrQA and BERTserini with 24.46% and 6.28%, respectively. From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.
翻译:问题解答(QA)是信息检索和信息提取领域的一项自然语言理解任务,近年来,由于机器阅读理解模型的强劲发展,计算语言和人工智能研究界对这项工作十分重视。基于阅读的QA系统是一个高级搜索引擎,可以使用机器阅读理解(MRC-ViQA)技术找到对开放式或特定领域文本中询问或问题的正确答案。另一方面,MRC和QA系统在数据资源和机器学习方法方面的多数进展,特别是英语和中文等两种资源丰富的语言。越南这种低资源语言在QA系统上研究不足。本文展示了XLMRQA,这是第一个越南的QA系统,使用以监督的变压器阅读基于维基语言的文本知识源(使用UIT-ViQAmpall),对使用深层神经网络模型的两种强大的QA系统表现优于:DrQA和BERTSerini, 分别为24.46%和6.28%。根据对三种系统的绩效分析,对三种系统的影响。