Entity alignment (EA) aims to find equivalent entities in different knowledge graphs (KGs). Current EA approaches suffer from scalability issues, limiting their usage in real-world EA scenarios. To tackle this challenge, we propose LargeEA to align entities between large-scale KGs. LargeEA consists of two channels, i.e., structure channel and name channel. For the structure channel, we present METIS-CPS, a memory-saving mini-batch generation strategy, to partition large KGs into smaller mini-batches. LargeEA, designed as a general tool, can adopt any existing EA approach to learn entities' structural features within each mini-batch independently. For the name channel, we first introduce NFF, a name feature fusion method, to capture rich name features of entities without involving any complex training process. Then, we exploit a name-based data augmentation to generate seed alignment without any human intervention. Such design fits common real-world scenarios much better, as seed alignment is not always available. Finally, LargeEA derives the EA results by fusing the structural features and name features of entities. Since no widely-acknowledged benchmark is available for large-scale EA evaluation, we also develop a large-scale EA benchmark called DBP1M extracted from real-world KGs. Extensive experiments confirm the superiority of LargeEA against state-of-the-art competitors.
翻译:实体对齐(EA)的目的是在不同的知识图表(KGs)中找到等效实体。当前的EA方法存在可缩放问题,限制了其在现实世界EA情景中的使用。为了应对这一挑战,我们建议大EA在大型KGs之间对准实体。大EA由两个渠道组成,即结构频道和名称频道。对于结构频道,我们介绍一个节省记忆的小型小批量生成战略METIIS-CPS,以将大型KGs分割成小型小桶。作为一般工具设计的大EA可以采用任何现有的EA方法独立地学习每个微型批量实体的结构特征。对于名称频道,我们首先采用名称组合法NFF,即名称组合法,在不涉及任何复杂培训进程的情况下捕捉实体的丰富名称特征。然后,我们利用基于名称的数据增强来产生种子对齐,而没有任何人类的干预。这种设计更符合共同的现实情景,因为种子对齐并非总能得到。最后,大环境应用实体的结构特征和名称特征来获取EA结果。对于名称频道来说,我们首先采用NFF,即名称组合组合方法,我们首先采用NFF,即名称组合组合组合方法,然后在不采用任何复杂的实体的组合组合组合方法,然后在任何复杂的训练实体中,然后在不涉及任何复杂的全球的大型实验中,然后利用以大规模的高级的高级的A-A-BPBSBS标准,然后从大规模实验中,我们所的大型的大型实验性标准,然后又可以用来对大型的大规模地进行大规模地进行大规模实验。