Sarcasm is a linguistic phenomenon indicating a discrepancy between literal meanings and implied intentions. Due to its sophisticated nature, it is usually challenging to be detected from the text itself. As a result, multi-modal sarcasm detection has received more attention in both academia and industries. However, most existing techniques only modeled the atomic-level inconsistencies between the text input and its accompanying image, ignoring more complex compositions for both modalities. Moreover, they neglected the rich information contained in external knowledge, e.g., image captions. In this paper, we propose a novel hierarchical framework for sarcasm detection by exploring both the atomic-level congruity based on multi-head cross attention mechanism and the composition-level congruity based on graph neural networks, where a post with low congruity can be identified as sarcasm. In addition, we exploit the effect of various knowledge resources for sarcasm detection. Evaluation results on a public multi-modal sarcasm detection dataset based on Twitter demonstrate the superiority of our proposed model.
翻译:讽刺是一种语言现象,表明字面含义和隐含意图之间存在差异。由于其复杂性质,通常很难从文本本身中发现。因此,在学术界和行业中,多式讽刺探测得到更多的注意。然而,大多数现有技术只是模拟了文字输入及其附带图像之间在原子层面的不一致,忽视了两种模式的复杂构成。此外,它们忽视了外部知识(例如图像说明)中包含的丰富信息。在本文中,我们提议了一个新的讽刺探测等级框架,通过探索基于多头交叉关注机制的原子级和谐以及基于图形神经网络的构成层面和谐,其中可以识别出与图像相近的后遗迹。此外,我们利用各种知识资源的影响进行讽刺探测。基于Twitter的公共多式讽刺探测数据集的评价结果显示了我们拟议模型的优越性。