Internet memes have emerged as a popular multimodal medium, yet they are increasingly weaponized to convey harmful opinions through subtle rhetorical devices like irony and metaphor. Existing detection approaches, including MLLM-based techniques, struggle with these implicit expressions, leading to frequent misjudgments. This paper introduces PatMD, a novel approach that improves harmful meme detection by learning from and proactively mitigating these potential misjudgment risks. Our core idea is to move beyond superficial content-level matching and instead identify the underlying misjudgment risk patterns, proactively guiding the MLLMs to avoid known misjudgment pitfalls. We first construct a knowledge base where each meme is deconstructed into a misjudgment risk pattern explaining why it might be misjudged, either overlooking harmful undertones (false negative) or overinterpreting benign content (false positive). For a given target meme, PatMD retrieves relevant patterns and utilizes them to dynamically guide the MLLM's reasoning. Experiments on a benchmark of 6,626 memes across 5 harmful detection tasks show that PatMD outperforms state-of-the-art baselines, achieving an average of 8.30\% improvement in F1-score and 7.71\% improvement in accuracy, demonstrating strong generalizability and improved detection capability of harmful memes.
翻译:互联网迷因已成为一种流行的多模态媒介,但它们正日益被武器化,通过讽刺和隐喻等微妙的修辞手段传达有害观点。现有的检测方法,包括基于MLLM的技术,难以应对这些隐含表达,导致频繁的误判。本文提出了一种新颖的方法PatMD,通过学习并主动缓解这些潜在的误判风险来改进有害迷因检测。我们的核心思想是超越表面的内容级匹配,转而识别潜在的误判风险模式,主动引导MLLMs避开已知的误判陷阱。我们首先构建一个知识库,其中每个迷因被解构为一个解释其为何可能被误判的风险模式——无论是忽略了有害的潜在含义(假阴性)还是过度解读了良性内容(假阳性)。对于给定的目标迷因,PatMD检索相关模式并利用它们动态地指导MLLM的推理。在涵盖5项有害检测任务的6,626个迷因基准测试上的实验表明,PatMD优于最先进的基线方法,在F1分数上平均提升了8.30%,在准确率上平均提升了7.71%,展现出强大的泛化能力和改进的有害迷因检测能力。