Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of "skills" that question answering/reading comprehension systems are supposed to acquire, and propose a new taxonomy. The supplementary materials survey the current multilingual resources and monolingual resources for languages other than English, and we discuss the implications of over-focusing on English. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.
翻译:近些年来,除了对全国语言方案深层学习模式进行大量研究外,在跟踪模型进展所需的基准数据集方面也做了大量工作。在这方面,回答问题和阅读理解特别多,过去两年中出现了80多个新的数据集。这项研究是迄今为止对实地情况的最大调查。我们概述了目前资源的各种格式和领域,突出了当前工作的缺陷。我们进一步讨论了目前对回答/阅读理解系统应该获得的“技能”分类,并提出了新的分类法。补充材料调查了目前英语以外语言的多种语言资源和单语资源,我们讨论了过分注重英语的影响。研究的目的是让从业人员了解现有数据丰富的指标,以及研究新资源的研究人员。