Warning: this paper contains content that may be offensive or upsetting. In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.
翻译:在目前情况下,在线平台在各种地缘政治事件和社会问题中被有效武器化,互联网Memes使公平的内容在规模上更加困难。关于Meme分类和跟踪的现有工作侧重于黑箱方法,这些方法没有明确考虑Memes的语义或其创建的背景。在本文中,我们追求一个模块和可解释的互联网内涵理解架构。我们设计和实施多式联运分类方法,对培训案例进行以实例和原型为基础的推理,同时利用SOTA和视觉SOTA模型来代表个别案例。我们研究我们的模块和可解释模型在发现现有两项任务:仇恨言论探测和Misogyny分类方面的关联性。我们比较了实例和原型方法的性能,以及文字、视觉和多式联运模型的性能,这些是不同类型的伤害(例如,定型和目标化)。我们设计了一个方便用户界面,便于比较分析我们所有模型所收集的示例,供任何给予的Mememe,向社区说明这些方法的强弱和局限性。