Despite encoding enormous amount of rich and valuable data, existing data sources are mostly created independently, being a significant challenge to their integration. Mapping languages, e.g., RML and R2RML, facilitate declarative specification of the process of applying meta-data and integrating data into a knowledge graph. Mapping rules can also include knowledge extraction functions in addition to expressing correspondences among data sources and a unified schema. Combining mapping rules and functions represents a powerful formalism to specify pipelines for integrating data into a knowledge graph transparently. Surprisingly, these formalisms are not fully adapted, and many knowledge graphs are created by executing ad-hoc programs to pre-process and integrate data. In this paper, we present EABlock, an approach integrating Entity Alignment (EA) as part of RML mapping rules. EABlock includes a block of functions performing entity recognition from textual attributes and link the recognized entities to the corresponding resources in Wikidata, DBpedia, and domain specific thesaurus, e.g., UMLS. EABlock provides agnostic and efficient techniques to evaluate the functions and transfer the mappings to facilitate its application in any RML-compliant engine. We have empirically evaluated EABlock performance, and results indicate that EABlock speeds up knowledge graph creation pipelines that require entity recognition and linking in state-of-the-art RML-compliant engines. EABlock is also publicly available as a tool through a GitHub repository(https://github.com/SDM-TIB/EABlock) and a DOI(https://doi.org/10.5281/zenodo.5779773).
翻译:尽管有大量丰富和有价值的数据,但现有数据来源大多是独立创建的,是对其整合的重大挑战。绘图语言,如RML和R2RML等,有助于对应用元数据和将数据纳入知识图表的过程作出宣示性说明;绘图规则除了表达数据来源和统一模式之间的对应关系外,还可以包括知识提取功能。将绘图规则和功能结合起来是一种强有力的形式主义,可以指定将数据透明地纳入知识图表的管道。令人惊讶的是,这些形式主义没有得到充分的调整,许多知识图表是通过实施预处理和整合数据的特设程序而创建的。在本文件中,我们介绍EABlock,一种整合实体统一的方法,作为RML的绘图规则的一部分。EABlock包括一组功能,从文本属性上进行实体识别,并将公认的实体与Wikigatadata、DBpeedia和域内具体术语库中的相应资源联系起来。例如,UMLS。EABLlock提供了评估功能和高效技术,通过实施预处理和集数据程序,将IMLA/RAB应用程序中的任何结果链接到EMALA/stal 。