Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. However, since these suggestions are from a vanilla generation model, they might not include the appropriate properties required to counter a particular hate speech instance. In this paper, we propose CounterGeDi - an ensemble of generative discriminators (GeDi) to guide the generation of a DialoGPT model toward more polite, detoxified, and emotionally laden counterspeech. We generate counterspeech using three datasets and observe significant improvement across different attribute scores. The politeness and detoxification scores increased by around 15% and 6% respectively, while the emotion in the counterspeech increased by at least 10% across all the datasets. We also experiment with triple-attribute control and observe significant improvement over single attribute results when combining complementing attributes, e.g., politeness, joyfulness and detoxification. In all these experiments, the relevancy of the generated text does not deteriorate due to the application of these controls
翻译:最近,许多研究试图通过提供反言语建议来帮助反演讲者,从而创造新一代模式,帮助反言者,为打击网上仇恨的爆炸性扩散提供反言语建议。然而,由于这些建议来自香草生成模式,它们可能不包括打击某个仇恨言论案例所需的适当属性。在本文中,我们提议反Gedi -- -- 一种基因歧视者(Gedi)的组合(Gedi),以引导DialoGPT模式的生成,使其更礼貌、解毒和情绪排泄反言。我们利用三个数据集生成反言,并观察到不同属性得分有显著改善。礼貌和解毒得分分别增加了15%和6%左右,而反言中的情绪在所有数据集中至少增加了10%。我们还尝试三重控制,在结合属性(如礼貌、喜悦和解毒)时观察到单一属性结果的重大改善。在所有这些实验中,产生的文本的适切性不会因这些控制措施的应用而恶化。