In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.
翻译:在当前的背景下,在线平台在各种地缘政治事件和社会问题中被有效地武器化,互联网迷因使得大规模的公正内容管理变得更加困难。现有的迷因分类和跟踪工作集中在不明确考虑迷因语义或其创作背景的黑匣子方法上。在本文中,我们追求一种多模态、可解释的互联网迷因理解架构。我们设计和实现了多模态分类方法,这些方法在训练案例中执行基于示例和原型的推理,同时利用文本和视觉SOTA模型来表示各个案例。我们研究了我们的模块化和可解释模型在检测有害迷因的两个现有任务中的相关性:仇恨言论检测和厌女症分类。我们比较了基于示例和原型的方法、文本、视觉和多模态模型在不同有害性类别(如刻板印象和物化)之间的性能。我们设计了一个用户友好的界面,以便比较分析我们所有模型检索的任何给定迷因的示例,告诉社区这些可解释方法的优点和局限性。