Entity alignment (EA) aims at finding equivalent entities in different knowledge graphs (KGs). Embedding-based approaches have dominated the EA task in recent years. Those methods face problems that come from the geometric properties of embedding vectors, including hubness and isolation. To solve these geometric problems, many normalization approaches have been adopted to EA. However, the increasing scale of KGs renders it is hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. ClusterEA contains three components to align entities between large-scale KGs, including stochastic training, ClusterSampler, and SparseFusion. It first trains a large-scale Siamese GNN for EA in a stochastic fashion to produce entity embeddings. Based on the embeddings, a novel ClusterSampler strategy is proposed for sampling highly overlapped mini-batches. Finally, ClusterEA incorporates SparseFusion, which normalizes local and global similarity and then fuses all similarity matrices to obtain the final similarity matrix. Extensive experiments with real-life datasets on EA benchmarks offer insight into the proposed framework, and suggest that it is capable of outperforming the state-of-the-art scalable EA framework by up to 8 times in terms of Hits@1.
翻译:实体对齐(EA)的目的是在不同的知识图表(KGs)中找到等效实体。基于嵌入式的方法近年来主导了EA的任务。这些方法面临来自嵌入矢量的几何特性的问题,包括中枢和孤立。为了解决这些几何问题,对EA采取了许多正常化办法。然而,由于KGs规模的扩大,EA模型很难采用正常化进程,从而限制了其在现实世界应用程序中的使用。为了应对这一挑战,我们提出了CyMEA,这是一个总框架,能够扩大EA模型的规模,并通过在实体等效率高的微型信箱中利用常规化方法来提高效果。这些方法面临来自嵌入矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量矢量,包括施压训练、集成标量标量标量标量标度和SprasserFusususion。它首先用一个大型的Siame GNNNN, 用来产生实体嵌积。基于嵌基体的新型集质标度标度战略建议对高度重叠的微型阵点标度和直径框架进行取样。最后的IEEASliflialFslexFservealexexexexexexexexexexeximmexexexeximmlational eximmlations。