As the number of published scholarly articles grows steadily each year, new methods are needed to organize scholarly knowledge so that it can be more efficiently discovered and used. Natural Language Processing (NLP) techniques are able to autonomously process scholarly articles at scale and to create machine readable representations of the article content. However, autonomous NLP methods are by far not sufficiently accurate to create a high-quality knowledge graph. Yet quality is crucial for the graph to be useful in practice. We present TinyGenius, a methodology to validate NLP-extracted scholarly knowledge statements using microtasks performed with crowdsourcing. The scholarly context in which the crowd workers operate has multiple challenges. The explainability of the employed NLP methods is crucial to provide context in order to support the decision process of crowd workers. We employed TinyGenius to populate a paper-centric knowledge graph, using five distinct NLP methods. In the end, the resulting knowledge graph serves as a digital library for scholarly articles.
翻译:由于出版的学术文章数量逐年稳步增加,需要新方法来组织学术知识,以便更有效地发现和使用。自然语言处理(NLP)技术能够大规模自主地处理学术文章,并创造可机器阅读的文章内容。然而,自主的NLP方法远不够准确,不足以制作高质量的知识图表。但质量对于图在实践中有用至关重要。我们介绍了TnyyGenius,这是利用众包操作的微任务验证NLP提取的学术知识声明的一种方法。人群工人在学术上所处的环境面临多重挑战。使用NLP方法的解释对于提供背景以支持人群工人的决策进程至关重要。我们使用TinyGenius来用五种不同的NLP方法来投放一个以纸张为中心的知识图表。最终,所产生的知识图表充当了学术文章的数字图书馆。