Detecting online hate is a complex task, and low-performing models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is an emerging challenge for automated detection. We present HatemojiCheck, a test suite of 3,930 short-form statements that allows us to evaluate performance on hateful language expressed with emoji. Using the test suite, we expose weaknesses in existing hate detection models. To address these weaknesses, we create the HatemojiBuild dataset using a human-and-model-in-the-loop approach. Models built with these 5,912 adversarial examples perform substantially better at detecting emoji-based hate, while retaining strong performance on text-only hate. Both HatemojiCheck and HatemojiBuild are made publicly available. See our Github Repository (https://github.com/HannahKirk/Hatemoji). HatemojiCheck, HatemojiBuild, and the final Hatemoji Model are also available on HuggingFace (https://huggingface.co/datasets/HannahRoseKirk/).
翻译:检测网上仇恨是一项复杂的任务,而低效模型在用于诸如内容温和等敏感应用时会产生有害后果。 Emoji基于仇恨是一个新出现的自动检测挑战。 我们展示了Hatemoji Check, 测试套套装有3, 930个短式声明, 允许我们用emoji来评估仇恨语言的表现。 使用测试套件, 我们暴露了现有仇恨检测模式中的弱点。 为了解决这些弱点, 我们使用人与人之间的模型在网上创建了HatemojiBild数据集。 与这些5,912个对抗范例一起建立的模型在发现基于情感的仇恨方面表现要好得多, 同时在只字的仇恨上保持强有力的表现。 Hatemoji Check和Hatemojibuild都公开提供。 见我们的Github Repository (https://github.com/HannakhKirk/Hatememoji) 。 Hatemojiguard、Hatemojibuild和Hatemoji moji 模型也在Hugging Fastece上 (https://huggingfaceface.co/ datasks/Hanask/Hanas)。