Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available as Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: 1) Multimodal KGEs, 2) A blocking procedure, and finally, 3) Hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8-14% in terms of the F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github: https://github.com/sntcristian/and-kge and Zenodo:https://doi.org/10.5281/zenodo.6309855 respectively.
翻译:学术数据正在不断增长,包含来自包括会议、期刊等众多场所的关于文章的信息。已经采取了许多举措,将学术数据作为知识图表(KGs)提供。这些数据标准化和提供这些数据的努力也带来了许多挑战,例如学术文章的探索、模糊作者等。这项研究更具体地针对学术KGs的作者名称模糊(AND)问题,并提出了一个新颖的框架,即用从这些KGs获得的多式字面信息来提供知识图表(KGes)。这个框架以三个组成部分为基础:(1)多式KGes,2个阻塞程序,最后,3个挑战。在两个新建的KGGs上进行了广泛的实验:(一) KG,其中载有1978年以来从科学计量学杂志(OC-782Kk)获得的信息,以及(二)从Aminub-ando 10-14提供的著名基准中提取的KGGs。这个框架基于三个部分:(1)多式的KGGGs,1,2,阻隔式程序,3,3),高级的聚合组群集。通过Seral_G_G_G_G_G_Bral_Bral_G_O_G_Br_O_O_O_O_O_O_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_C_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_C_C_C_C_BAR_BAR_C_BAR_C_C_C_BAR_BAR_BAR_C_BAR_BAR_BAR_BAR_C_C_BAR_C_BAR_C_C_C_BAR_C_C_BAR_C_BAR_BAR_C_C_BAR_C_C_BAR_C_C_