Textual Question Answering (QA) aims to provide precise answers to user's questions in natural language using unstructured data. One of the most popular approaches to this goal is machine reading comprehension(MRC). In recent years, many novel datasets and evaluation metrics based on classical MRC tasks have been proposed for broader textual QA tasks. In this paper, we survey 47 recent textual QA benchmark datasets and propose a new taxonomy from an application point of view. In addition, We summarize 8 evaluation metrics of textual QA tasks. Finally, we discuss current trends in constructing textual QA benchmarks and suggest directions for future work.
翻译:文本问题解答(QA)的目的是利用非结构化数据对用户使用自然语言的问题提供准确的答案。实现这一目标的最流行办法之一是机读理解(MRC) 近些年来,根据传统的MRC任务,提出了许多新的数据集和评价指标,用于更广泛的文本质解任务。在本文件中,我们调查了47个近期文本质解基准数据集,并从应用角度提出了一个新的分类。此外,我们总结了8项文本质解任务的评价指标。最后,我们讨论了构建文本质解基准的当前趋势,并为今后的工作提出了方向。