Automatic extraction of funding information from academic articles adds significant value to industry and research communities, such as tracking research outcomes by funding organizations, profiling researchers and universities based on the received funding, and supporting open access policies. Two major challenges of identifying and linking funding entities are: (i) sparse graph structure of the Knowledge Base (KB), which makes the commonly used graph-based entity linking approaches suboptimal for the funding domain, (ii) missing entities in KB, which (unlike recent zero-shot approaches) requires marking entity mentions without KB entries as NIL. We propose an entity linking model that can perform NIL prediction and overcome data scarcity issues in a time and data-efficient manner. Our model builds on a transformer-based mention detection and bi-encoder model to perform entity linking. We show that our model outperforms strong existing baselines.
翻译:从学术文章中自动提取供资信息对产业界和研究界具有重大价值,例如跟踪供资组织的研究成果、根据收到的供资情况对研究人员和大学进行剖析,以及支持开放获取政策,查明和联系供资实体的两大挑战是:(一) 知识库的稀疏图示结构,使通常使用的图形化实体将供资领域的方法连接起来,不最优化;(二) KB的缺失实体,该实体(与最近的零点办法不同)要求标识实体不以KB条目作为NIL提及。我们提议建立一个实体,将能够进行NIL预测和以具有数据效率的方式克服数据稀缺问题的模型连接起来,我们的模式建立在基于变压器的提及检测和双编码模型的基础上,以实施实体链接。我们表明,我们的模型比现有的强基线要强。