Open-domain Question Answering (OpenQA) is an important task in Natural Language Processing (NLP), which aims to answer a question in the form of natural language based on large-scale unstructured documents. Recently, there has been a surge in the amount of research literature on OpenQA, particularly on techniques that integrate with neural Machine Reading Comprehension (MRC). While these research works have advanced performance to new heights on benchmark datasets, they have been rarely covered in existing surveys on QA systems. In this work, we review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques. Specifically, we begin with revisiting the origin and development of OpenQA systems. We then introduce modern OpenQA architecture named "Retriever-Reader" and analyze the various systems that follow this architecture as well as the specific techniques adopted in each of the components. We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used. We hope our work would enable researchers to be informed of the recent advancement and also the open challenges in OpenQA research, so as to stimulate further progress in this field.
翻译:开放式问题解答(OpenQA)是自然语言处理(NLP)中的一项重要任务,其目的是以大规模无结构文件为基础,以自然语言的形式回答一个问题。最近,关于 OpenQA 的研究文献数量激增,特别是结合神经机读理解(MRC)的技术的研究文献数量激增。虽然这些研究作品在基准数据集上达到新高度的先进性能,但在现有的QA系统调查中却很少涉及它们。在这项工作中,我们审查了 OpenQA 的最新研究趋势,特别注意包含神经MRC 技术的系统。具体地说,我们开始重新审视OpenQA 系统的来源和发展。然后我们引入名为“Retriever-Reader”的现代OpenQA 结构,分析遵循这一结构的各种系统以及每个组成部分采用的具体技术。我们然后讨论开发 OpenQA 系统的关键挑战,并对通常使用的基准进行分析。我们希望我们的工作能够让研究人员进一步了解最近的进展以及开放QA 研究领域面临的公开挑战。