Fighting online hate speech is a challenge that is usually addressed using Natural Language Processing via automatic detection and removal of hate content. Besides this approach, counter narratives have emerged as an effective tool employed by NGOs to respond to online hate on social media platforms. For this reason, Natural Language Generation is currently being studied as a way to automatize counter narrative writing. However, the existing resources necessary to train NLG models are limited to 2-turn interactions (a hate speech and a counter narrative as response), while in real life, interactions can consist of multiple turns. In this paper, we present a hybrid approach for dialogical data collection, which combines the intervention of human expert annotators over machine generated dialogues obtained using 19 different configurations. The result of this work is DIALOCONAN, the first dataset comprising over 3000 fictitious multi-turn dialogues between a hater and an NGO operator, covering 6 targets of hate.
翻译:打击网上仇恨言论是一项挑战,通常通过自动检测和删除仇恨内容,利用自然语言处理方法解决。除了这一方法外,反语叙述已成为非政府组织用来应对社交媒体平台上网上仇恨的有效工具。为此,目前正在研究如何将反语表述自动化,但培训NLG模式的现有资源仅限于两轮互动(仇恨言论和反语回应),而在现实生活中,互动可以包括多重转折。本文介绍了对话数据收集的混合方法,结合了人类专家警告员对利用19个不同配置生成的机器对话的干预。这项工作的结果是DIALOCONAN,这是由仇恨者和非政府组织操作员之间3000多次虚拟多方向对话组成的第一个数据集,覆盖了6个仇恨目标。