Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward large-scale scientific knowledge bases that could enable advanced scientific question answering and analytics. We present the first dataset for EL in scientific tables. EL for scientific tables is especially challenging because scientific knowledge bases can be very incomplete, and disambiguating table mentions typically requires understanding the papers's tet in addition to the table. Our dataset, S2abEL, focuses on EL in machine learning results tables and includes hand-labeled cell types, attributed sources, and entity links from the PaperswithCode taxonomy for 8,429 cells from 732 tables. We introduce a neural baseline method designed for EL on scientific tables containing many out-of-knowledge-base mentions, and show that it significantly outperforms a state-of-the-art generic table EL method. The best baselines fall below human performance, and our analysis highlights avenues for improvement.
翻译:实体链接 (Entity Linking, EL) 的任务是将文本提及与知识库中相应的实体进行链接,这对许多以知识为基础的 NLP 应用程序至关重要。当应用于科学论文中的表格时,EL 是实现大规模科学知识库的一步,这将使得高级科学问答和分析成为可能。我们提供了第一个针对科学表格 EL 的数据集。针对科学论文中的表格 EL 尤其具有挑战性,因为科学知识库可能非常不完整,而区分表格提及通常需要理解论文文本,而不仅仅是表格本身。我们的数据集 S2abEL 专注于机器学习结论表格中的 EL,包括来自 PaperswithCode 分类体系的 8,429 个单元格的手动标记单元格类型、属性来源和实体链接。我们介绍了一种神经基线方法,设计用于在包含许多知识库中不存在的提及的科学表格上进行 EL,并展示该方法显著优于最新的通用表格 EL 方法。最佳基线低于人类表现,我们的分析强调了改进的方法。