As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with relevant images attached. We noticed that current MMEA algorithms all globally adopt the KG-level modality fusion strategies for multi-modal entity representation but ignore the variation in modality preferences for individual entities, hurting the robustness to potential noise involved in modalities (e.g., blurry images and relations). In this paper, we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, which dynamically predicts the mutual correlation coefficients among modalities for entity-level feature aggregation. A modal-aware hard entity replay strategy is further proposed for addressing vague entity details. Experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has a comparable number of parameters, optimistic speed, and good interpretability. Our code and data are available at https://github.com/zjukg/MEAformer.
翻译:作为实体对齐的一个重要变体,多模态实体对齐(MMEA)旨在发现附有相关图片的不同知识图谱(KG)之间的相同实体。我们注意到,当前的MMEA算法全都采用了KG级别的模态融合策略来实现多模态实体表示,但忽略了每个实体的模态偏好差异,这会损害对可能存在的噪声(例如模糊的图像和关系)的鲁棒性。本文提出一种多模态实体对齐变换器(MEAformer)方法,用于元模态混合,可以动态地预测模态之间的相互关联系数,以进行实体级特征聚合。进一步提出了一种模态感知的硬实体重放策略,用于处理模糊的实体细节。实验结果表明,我们的模型不仅在多个训练方案(包括有监督、无监督、迭代和低资源)上实现了SOTA表现,而且参数数量、速度乐观和可解释性良好。我们的代码和数据可在https://github.com/zjukg/MEAformer上获取。