The prevalence of memes on social media has created the need to sentiment analyze their underlying meanings for censoring harmful content. Meme censoring systems by machine learning raise the need for a semi-supervised learning solution to take advantage of the large number of unlabeled memes available on the internet and make the annotation process less challenging. Moreover, the approach needs to utilize multimodal data as memes' meanings usually come from both images and texts. This research proposes a multimodal semi-supervised learning approach that outperforms other multimodal semi-supervised learning and supervised learning state-of-the-art models on two datasets, the Multimedia Automatic Misogyny Identification and Hateful Memes dataset. Building on the insights gained from Contrastive Language-Image Pre-training, which is an effective multimodal learning technique, this research introduces SemiMemes, a novel training method that combines auto-encoder and classification task to make use of the resourceful unlabeled data.
翻译:社交媒体上Memes的普遍存在导致了需要对其进行情感分析以审查有害内容。 机器学习的Meme审查系统需要半监督学习解决方案,以利用互联网上大量未标记的Memes,并使注释过程更加简单。 此外,该方法需要利用多模态数据,因为Memes的含义通常来自图像和文本两方面。 本研究提出了一种多模态半监督学习方法,该方法在两个数据集(多媒体自动女性仇视识别和令人讨厌的Memes数据集)上的表现优于其他多模态半监督学习和监督学习现有模型。 建立在对比语言 - 图像预训练的洞察力之上,这项研究引入了SemiMemes,一种将自编码器和分类任务结合起来利用丰富的未标记数据的新型训练方法。