Hateful memes pose a unique challenge for current machine learning systems because their message is derived from both text- and visual-modalities. To this effect, Facebook released the Hateful Memes Challenge, a dataset of memes with pre-extracted text captions, but it is unclear whether these synthetic examples generalize to `memes in the wild'. In this paper, we collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that memes in the wild differ in two key aspects: 1) Captions must be extracted via OCR, injecting noise and diminishing performance of multimodal models, and 2) Memes are more diverse than `traditional memes', including screenshots of conversations or text on a plain background. This paper thus serves as a reality check for the current benchmark of hateful meme detection and its applicability for detecting real world hate.
翻译:仇恨的Memes对当前机器学习系统提出了独特的挑战, 因为它们的信息来自文字和视觉模式。 为此, Facebook 发布了《 仇恨的Memes 挑战》, 这是一组包含预先提取的文字说明的Memes 数据集, 但不清楚这些合成例子是否概括为“ 野外的Memes ” 。 在本文中, 我们收集了Pinter的仇恨和非仇恨的Mmes, 以评价在Facebook 数据集中预先训练的模型的外表性能。 我们发现野外的Memes在两个关键方面有所不同:(1) 必须通过 OCR、 注入噪音和减少多式模型的性能提取, 和(2) 模型比“ 传统Memes” 更加多样化, 包括谈话的截图或简单背景上的文本。 因此, 本文对当前仇恨的Mmeme 探测基准及其在发现真实世界仇恨方面的适用性进行了现实检查。