Hateful meme classification is a challenging multimodal task that requires complex reasoning and contextual background knowledge. Ideally, we could leverage an explicit external knowledge base to supplement contextual and cultural information in hateful memes. However, there is no known explicit external knowledge base that could provide such hate speech contextual information. To address this gap, we propose PromptHate, a simple yet effective prompt-based model that prompts pre-trained language models (PLMs) for hateful meme classification. Specifically, we construct simple prompts and provide a few in-context examples to exploit the implicit knowledge in the pre-trained RoBERTa language model for hateful meme classification. We conduct extensive experiments on two publicly available hateful and offensive meme datasets. Our experimental results show that PromptHate is able to achieve a high AUC of 90.96, outperforming state-of-the-art baselines on the hateful meme classification task. We also perform fine-grained analyses and case studies on various prompt settings and demonstrate the effectiveness of the prompts on hateful meme classification.
翻译:仇恨的Memme分类是一项具有挑战性的多式联运任务,需要复杂的推理和背景背景知识。理想的情况是,我们可以利用一个明确的外部知识库来补充仇恨的Memes中的背景和文化信息。然而,没有已知的明确的外部知识库可以提供这种仇恨言论背景信息。为了解决这一差距,我们建议“迅速”模式,这是一个简单而有效的快速模型,它能促进为仇恨的Memme分类提供经过事先训练的语言模型(PLMS)。具体地说,我们制作简单的提示,并提供几个文本内的例子,利用经过训练的RoBERTA语言模型中隐含的知识来进行仇恨的Memme分类。我们对两种公开存在的仇恨和攻击性Meme数据集进行了广泛的实验。我们的实验结果表明,“迅速”能够实现90.96的高级ACU,在仇恨的Mmeme分类任务上超越了最先进的基线。我们还对各种快速环境进行了精细的分析和案例研究,并展示了仇恨的Mmee分类的提示的有效性。