Most knowledge graphs (KGs) are incomplete, which motivates one important research topic on automatically complementing knowledge graphs. However, evaluation of knowledge graph completion (KGC) models often ignores the incompleteness -- facts in the test set are ranked against all unknown triplets which may contain a large number of missing facts not included in the KG yet. Treating all unknown triplets as false is called the closed-world assumption. This closed-world assumption might negatively affect the fairness and consistency of the evaluation metrics. In this paper, we study KGC evaluation under a more realistic setting, namely the open-world assumption, where unknown triplets are considered to include many missing facts not included in the training or test sets. For the currently most used metrics such as mean reciprocal rank (MRR) and Hits@K, we point out that their behavior may be unexpected under the open-world assumption. Specifically, with not many missing facts, their numbers show a logarithmic trend with respect to the true strength of the model, and thus, the metric increase could be insignificant in terms of reflecting the true model improvement. Further, considering the variance, we show that the degradation in the reported numbers may result in incorrect comparisons between different models, where stronger models may have lower metric numbers. We validate the phenomenon both theoretically and experimentally. Finally, we suggest possible causes and solutions for this problem. Our code and data are available at https://github.com/GraphPKU/Open-World-KG .
翻译:多数知识图表(KGs)不完整,这在自动补充知识图表(KGC)上激发了一个重要的研究课题。然而,对知识图表完成模型(KGC)的评估往往忽略了不完整性 -- -- 测试组中的事实被排在所有未知的三重数据中,这些三重数据可能包含大量未包括在KG中的大量缺失事实。将所有未知的三重数据视为假的,称为封闭世界假设。这一封闭世界假设可能会对评价指标的公平和一致性产生消极影响。在本文中,我们在更现实的环境下研究KGC评估,即开放世界假设,其中未知的三重数据被认为包括许多没有包括在培训或测试组中缺失的事实。对于目前最常用的三重数据,例如平均对等等级(MRRR)和Hits@K,我们指出,在开放世界假设中,它们的行为可能是出乎意料的。具体地说,由于没有多少事实,它们的数量显示与模型的真正强度有关的对数趋势,因此,在反映真实的模型改进方面,指标的增加可能是微不足道的。此外,考虑到差异,我们所报告的数字可能具有更精确的实验性的结果,我们所报告的数据在不同的模型中的数字。