Knowledge Graphs (KGs) have found many applications in industry and academic settings, which in turn, have motivated considerable research efforts towards large-scale information extraction from a variety of sources. Despite such efforts, it is well known that even state-of-the-art KGs suffer from incompleteness. Link Prediction (LP), the task of predicting missing facts among entities already a KG, is a promising and widely studied task aimed at addressing KG incompleteness. Among the recent LP techniques, those based on KG embeddings have achieved very promising performances in some benchmarks. Despite the fast growing literature in the subject, insufficient attention has been paid to the effect of the various design choices in those methods. Moreover, the standard practice in this area is to report accuracy by aggregating over a large number of test facts in which some entities are over-represented; this allows LP methods to exhibit good performance by just attending to structural properties that include such entities, while ignoring the remaining majority of the KG. This analysis provides a comprehensive comparison of embedding-based LP methods, extending the dimensions of analysis beyond what is commonly available in the literature. We experimentally compare effectiveness and efficiency of 16 state-of-the-art methods, consider a rule-based baseline, and report detailed analysis over the most popular benchmarks in the literature.
翻译:知识图(KGs)在工业和学术环境中发现许多应用,这反过来又促使大量研究努力从各种来源大规模提取信息,尽管作出了这些努力,但众所周知,即使是最先进的KGs也存在不完全的问题。LP(LP)的任务是在已经是KG的实体中预测缺失的事实。LP(LP)是一项有希望和广泛研究的任务,目的是解决KG的不完整问题。在最近的LP技术中,基于KG嵌入的LP技术在某些基准中取得了非常有希望的成绩。尽管这个主题的文献迅速增加,但对这些方法中的各种设计选择的效果重视不够。此外,这一领域的标准做法是通过汇集大量测试事实来报告准确性,而有些实体代表过多;这使LP方法能够展示良好的业绩,只是关注包括这些实体的结构属性,而忽略了KG的其余大多数。这一分析对基于嵌入LP方法取得了非常有希望的业绩进行了全面比较。我们实验性地比较了基于LP方法的范围,将分析的范围扩大到文献中通常可用的范围以外。我们实验性地比较了16项基准法比文献中最精确地分析。