The development of knowledge graph (KG) applications has led to a rising need for entity alignment (EA) between heterogeneous KGs that are extracted from various sources. Recently, graph neural networks (GNNs) have been widely adopted in EA tasks due to GNNs' impressive ability to capture structure information. However, we have observed that the oversimplified settings of the existing common EA datasets are distant from real-world scenarios, which obstructs a full understanding of the advancements achieved by recent methods. This phenomenon makes us ponder: Do existing GNN-based EA methods really make great progress? In this paper, to study the performance of EA methods in realistic settings, we focus on the alignment of highly heterogeneous KGs (HHKGs) (e.g., event KGs and general KGs) which are different with regard to the scale and structure, and share fewer overlapping entities. First, we sweep the unreasonable settings, and propose two new HHKG datasets that closely mimic real-world EA scenarios. Then, based on the proposed datasets, we conduct extensive experiments to evaluate previous representative EA methods, and reveal interesting findings about the progress of GNN-based EA methods. We find that the structural information becomes difficult to exploit but still valuable in aligning HHKGs. This phenomenon leads to inferior performance of existing EA methods, especially GNN-based methods. Our findings shed light on the potential problems resulting from an impulsive application of GNN-based methods as a panacea for all EA datasets. Finally, we introduce a simple but effective method: Simple-HHEA, which comprehensively utilizes entity name, structure, and temporal information. Experiment results show Simple-HHEA outperforms previous models on HHKG datasets.
翻译:随着知识图谱应用的发展,由不同来源提取的异构知识图谱之间的实体对齐(EA)需求不断增长。最近,由于图神经网络(GNN)能够有效捕捉结构信息,因此GNN被广泛应用于EA任务。然而,我们观察到现有常见的EA数据集的过于简单化的设置远离真实世界场景,这妨碍了对最近方法进展的全面理解。这种现象让我们思考:现有的基于GNN的EA方法真的取得了很大进展吗?为研究EA方法在现实场景下的性能,本文侧重于高度异构知识图谱(HHKG)的对齐(例如事件KG和一般的KG),它们在规模和结构上不同且共享较少的重叠实体。首先,我们排除不合理的设置,并提出了两个新的HHKG数据集,这些数据集紧密模拟了真实的EA场景。然后,基于提出的数据集,我们进行了广泛的实验来评估以前代表性的EA方法,并揭示了关于GNN-based EA方法进展的有趣发现。我们发现,在对齐HHKG时,结构信息变得难以利用但仍然有价值。这种现象导致现有EA方法的性能较差,特别是基于GNN的方法。我们的发现为仓促应用GNN-based方法作为解决所有EA数据集问题的灵丹妙药带来了潜在问题。最后,我们引入了一种简单但有效的方法:Simple-HHEA,该方法全面利用实体名称、结构和时间信息。实验结果表明,Simple-HHEA在HHKG数据集上优于先前的模型。