Multimodal hate detection, which aims to identify harmful content online such as memes, is crucial for building a wholesome internet environment. Previous work has made enlightening exploration in detecting explicit hate remarks. However, most of their approaches neglect the analysis of implicit harm, which is particularly challenging as explicit text markers and demographic visual cues are often twisted or missing. The leveraged cross-modal attention mechanisms also suffer from the distributional modality gap and lack logical interpretability. To address these semantic gaps issues, we propose TOT: a topology-aware optimal transport framework to decipher the implicit harm in memes scenario, which formulates the cross-modal aligning problem as solutions for optimal transportation plans. Specifically, we leverage an optimal transport kernel method to capture complementary information from multiple modalities. The kernel embedding provides a non-linear transformation ability to reproduce a kernel Hilbert space (RKHS), which reflects significance for eliminating the distributional modality gap. Moreover, we perceive the topology information based on aligned representations to conduct bipartite graph path reasoning. The newly achieved state-of-the-art performance on two publicly available benchmark datasets, together with further visual analysis, demonstrate the superiority of TOT in capturing implicit cross-modal alignment.
翻译:多式仇恨检测旨在识别网上有害内容(如Memes),对于建立完整互联网环境至关重要。以往的工作在发现明显仇恨言论方面进行了启迪性探索。然而,大多数方法忽视了对隐性伤害的分析,因为明确的文本标记和人口视觉提示往往被扭曲或缺失,这尤其具有挑战性。杠杆交叉式关注机制也因分配模式差距和缺乏逻辑解释而受到影响。为了解决这些语义差距问题,我们建议TT:一个顶层-认识最佳运输框架,以破解Memes情景中隐含的伤害。Memes情景将跨式调整问题作为最佳运输计划的解决办法。具体地说,我们利用一种最佳的运输内核法获取多种模式的补充信息。内核嵌入提供了一种非线性转化能力,复制一个内核Hilbert空间(RKHMS),这反映了消除分配模式差距的重要性。此外,我们从一致的表达中看到基于对双式图表路径推理的表面信息。在两种公开的隐性标准定位中,新达到的状态状态状态,展示了在两种可获取的直观定位模型上对等数据进行对比分析。</s>