In order to address increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web-scale knowledge, lack of structure, inconsistent quality and noise. To this end, we propose a new setup for evaluating existing knowledge intensive tasks in which we generalize the background corpus to a universal web snapshot. We investigate a slate of NLP tasks which rely on knowledge - either factual or common sense, and ask systems to use a subset of CCNet - the Sphere corpus - as a knowledge source. In contrast to Wikipedia, otherwise a common background corpus in KI-NLP, Sphere is orders of magnitude larger and better reflects the full diversity of knowledge on the web. Despite potential gaps in coverage, challenges of scale, lack of structure and lower quality, we find that retrieval from Sphere enables a state of the art system to match and even outperform Wikipedia-based models on several tasks. We also observe that while a dense index can outperform a sparse BM25 baseline on Wikipedia, on Sphere this is not yet possible. To facilitate further research and minimise the community's reliance on proprietary, black-box search engines, we share our indices, evaluation metrics and infrastructure.
翻译:为了应对现实世界应用中日益增加的需求,知识密集型NLP(KI-NLP)的研究应该通过抓住真正开放的环境的挑战来推进:网络规模的知识、缺乏结构、质量和噪音。为此,我们提议建立一个新的架构来评估现有的知识密集型任务,在其中我们将背景资料推广为通用网络快照。我们调查了一组依赖于知识的NLP任务,无论是事实或常识,并要求各系统使用CCNet的一个子集(Sphere Pasy)作为知识来源。与Wikipedia相比,Sphere是KI-NLP的一个共同的背景材料,否则,Sphere是规模更大的,更好地反映网上知识的全面多样性。尽管在覆盖范围、规模挑战、缺乏结构和质量方面可能存在的差距,但我们发现从Sphere检索能够使艺术系统的状况与基于维基百科的模型相匹配甚至超出。我们还注意到,虽然密集的指数可以超越维基百科的稀少的BM25基线,但在黑Sphere,但尚不可能进一步依赖我们的专利和数据库。