Link prediction aims to identify potential missing triples in knowledge graphs. To get better results, some recent studies have introduced multimodal information to link prediction. However, these methods utilize multimodal information separately and neglect the complicated interaction between different modalities. In this paper, we aim at better modeling the inter-modality information and thus introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities. To this end, we propose a two-stage multimodal fusion framework to preserve modality-specific knowledge as well as take advantage of the complementarity between different modalities. Instead of directly projecting different modalities into a unified space, our multimodal fusion module limits the representations of different modalities independent while leverages bilinear pooling for fusion and incorporates contrastive learning as additional constraints. Furthermore, the decision fusion module delivers the learned weighted average over the predictions of all modalities to better incorporate the complementarity of different modalities. Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets. The implementation code is available online at https://github.com/HestiaSky/IMF-Pytorch.
翻译:连边预测旨在识别知识图谱中可能缺失的三元组。为了获得更好的结果,一些最新的研究引入了多模态信息到连边预测中。然而,这些方法单独利用多模态信息,忽略了不同模态之间的复杂交互。在本文中,我们旨在更好地对跨模态信息进行建模,因此引入了一种新型的交互多模态融合(IMF)模型来整合不同模态的知识。为此,我们提出了一个两阶段的多模态融合框架,既保留了模态特定的知识,也利用了不同模态之间的互补性。我们的多模态融合模块不直接将不同模态投影到一个统一的空间中,而是限制了各种模态的表示,同时利用双线性池化进行融合,并加入了对比学习作为额外的约束。此外,决策融合模块传递了所有模态预测的学习加权平均值,以更好地融合不同模态之间的互补性。我们的方法已在几个真实数据集上通过了实证评估。实现代码可在线上 https://github.com/HestiaSky/IMF-Pytorch 获取。