Recent transformer-based approaches demonstrate promising results on relational scientific information extraction. Existing datasets focus on high-level description of how research is carried out. Instead we focus on the subtleties of how experimental associations are presented by building SciClaim, a dataset of scientific claims drawn from Social and Behavior Science (SBS), PubMed, and CORD-19 papers. Our novel graph annotation schema incorporates not only coarse-grained entity spans as nodes and relations as edges between them, but also fine-grained attributes that modify entities and their relations, for a total of 12,738 labels in the corpus. By including more label types and more than twice the label density of previous datasets, SciClaim captures causal, comparative, predictive, statistical, and proportional associations over experimental variables along with their qualifications, subtypes, and evidence. We extend work in transformer-based joint entity and relation extraction to effectively infer our schema, showing the promise of fine-grained knowledge graphs in scientific claims and beyond.
翻译:最近以变压器为基础的变压器方法展示了关系科学信息提取的可喜结果。现有数据集侧重于如何进行研究的高层次描述。相反,我们侧重于实验协会如何通过建筑SciResources(SciResources,SciResources,SciReservations and Behavior Science,PubMed,和CORD-19文件的一组科学索赔数据)来展示的微妙之处。我们的新颖的图表注释计划不仅包含粗微的重积分实体,而且包括它们之间的节点和关系,而且还包括细微的属性,这些特征改变实体及其关系,总共12,738个标签。我们把更多的标签类型和超过先前数据集的标签密度的两倍以上,SciResures捕捉了因果关系、比较性、预测性、统计性以及实验变量及其资格、亚型和证据。我们扩大了基于变压器的联合实体和关系提取工作的范围,以有效地推断我们的Schemma, 展示了科学索赔中及以后的精细化知识图表的前景。