Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment. The detection of sarcasm in social media platforms has been applied in the past mainly to textual utterances where lexical indicators (such as interjections and intensifiers), linguistic markers, and contextual information (such as user profiles, or past conversations) were used to detect the sarcastic tone. However, modern social media platforms allow to create multimodal messages where audiovisual content is integrated with the text, making the analysis of a mode in isolation partial. In our work, we first study the relationship between the textual and visual aspects in multimodal posts from three major social media platforms, i.e., Instagram, Tumblr and Twitter, and we run a crowdsourcing task to quantify the extent to which images are perceived as necessary by human annotators. Moreover, we propose two different computational frameworks to detect sarcasm that integrate the textual and visual modalities. The first approach exploits visual semantics trained on an external dataset, and concatenates the semantics features with state-of-the-art textual features. The second method adapts a visual neural network initialized with parameters trained on ImageNet to multimodal sarcastic posts. Results show the positive effect of combining modalities for the detection of sarcasm across platforms and methods.
翻译:讽刺是一种奇怪的情绪表达形式,表面情绪与隐含情绪不同。过去,在社交媒体平台中发现讽刺的言辞主要用于文字表达,其中使用词汇指标(如插图和强化器)、语言标记和背景信息(如用户简介或过去的谈话)来检测讽刺语调。然而,现代社交媒体平台允许创建多式信息,将视听内容与文字融合在一起,对模式进行部分孤立分析。在我们的工作中,我们首先从三大社交媒体平台,即Instagram、Tumblr和Twitter, 来研究多式文章中的文字和视觉方面之间的关系,我们开展众包任务,量化人类告发者认为图像必要的范围。此外,我们提出两个不同的计算框架,以检测将文字和视觉模式结合起来的讽刺。第一种方法是利用外部数据集培训的视觉语义学,以及将智能化的语系特征与经培训的图像检测模型的初步模型相结合。我们提出了将图像检测模型模型的正式图像模型的第二个方法。