Memes are multimedia documents containing images and phrases that usually build a humorous meaning when combined. However, hateful memes are also spread hatred within social networks. Automatically detecting the hateful memes would help decrease their harmful societal impact. Unlike the conventional multimodal tasks, where the visual and textual information is semantically aligned, the challenge of hateful memes detection lies in its unique multimodal information. The multimodal information in the memes are weakly aligned or even irrelevant, which makes the model not only needs to understand the content in the memes but also reasoning over the multiple modalities. In this paper, we focus on hateful memes detection for multimodal memes and propose a novel method that incorporates the image captioning process into the memes detection process. We conducted extensive experiments on multimodal meme datasets and illustrated the effectiveness of our approach. Our model also achieves promising results on the Hateful memes detection challenge.
翻译:Memes是含有图像和短语的多媒体文件,通常在组合时具有幽默意义。然而,仇恨的Memes也在社交网络中散布仇恨。自动发现仇恨的Memes将有助于减少其有害的社会影响。与常规的多式联运任务不同,视觉和文字信息在结构上是一致的,仇恨的Memes检测的挑战在于其独特的多式联运信息。Memes中的多式信息是薄弱的或甚至无关紧要的,这使得该模型不仅需要理解Memes的内容,而且还需要理解多种模式的推理。在本文中,我们侧重于对多式Memes的仇恨Memes检测,并提出一种新颖的方法,将图像说明过程纳入Mmemess的检测过程。我们在多式Memes数据集上进行了广泛的实验,并展示了我们的方法的有效性。我们的模型还在仇恨的Mmemes检测挑战上取得了可喜的成果。