The incompleteness of Knowledge Graphs (KGs) is a crucial issue affecting the quality of AI-based services. In the scholarly domain, KGs describing research publications typically lack important information, hindering our ability to analyse and predict research dynamics. In recent years, link prediction approaches based on Knowledge Graph Embedding models became the first aid for this issue. In this work, we present Trans4E, a novel embedding model that is particularly fit for KGs which include N to M relations with N$\gg$M. This is typical for KGs that categorize a large number of entities (e.g., research articles, patents, persons) according to a relatively small set of categories. Trans4E was applied on two large-scale knowledge graphs, the Academia/Industry DynAmics (AIDA) and Microsoft Academic Graph (MAG), for completing the information about Fields of Study (e.g., 'neural networks', 'machine learning', 'artificial intelligence'), and affiliation types (e.g., 'education', 'company', 'government'), improving the scope and accuracy of the resulting data. We evaluated our approach against alternative solutions on AIDA, MAG, and four other benchmarks (FB15k, FB15k-237, WN18, and WN18RR). Trans4E outperforms the other models when using low embedding dimensions and obtains competitive results in high dimensions.
翻译:知识图(KGs)的不完整是一个影响AI-G$M服务质量的关键问题。在学术领域,描述研究出版物的KGs通常缺乏重要信息,这妨碍了我们分析和预测研究动态的能力。近年来,基于知识图嵌入模型的预测方法链接了知识图嵌入模型,成为这一问题的首当其冲。在这项工作中,我们介绍了一个新颖的嵌入模型Trans4E,它特别适合KGs,其中包括N至M与N$\gg$M的关系。对于按相对小的类别对大量实体(例如研究物品、专利、人)进行分类的KGGs来说,这是典型的。 Trans4E应用在两个大型知识图上,即AIDA/工业合成模型(AIDA)和微软学术图(MAGGG)上,用于完成有关研究领域(例如“神经网络”、“机械学习”、“人工智能智能智能”和从属性情报(例如“教育”、“公司”、“公司”、“公司”、“企业”、FKMA-B”的高级方法)和从我们的其他数据中得出的结果的精确范围和基准。