ToxiGen:用于检测反言语和隐含仇恨言论的大型机器生成数据集 (ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection)

Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle with detecting implicitly toxic language. To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset of 274k toxic and benign statements about 13 minority groups. We develop a demonstration-based prompting framework and an adversarial classifier-in-the-loop decoding method to generate subtly toxic and benign text with a massive pretrained language model. Controlling machine generation in this way allows ToxiGen to cover implicitly toxic text at a larger scale, and about more demographic groups, than previous resources of human-written text. We conduct a human evaluation on a challenging subset of ToxiGen and find that annotators struggle to distinguish machine-generated text from human-written language. We also find that 94.5% of toxic examples are labeled as hate speech by human annotators. Using three publicly-available datasets, we show that finetuning a toxicity classifier on our data improves its performance on human-written data substantially. We also demonstrate that ToxiGen can be used to fight machine-generated toxicity as finetuning improves the classifier significantly on our evaluation subset.

翻译：含有少数群体的有毒语言检测系统往往错误地标出有毒性的文字,因为这些群体往往是网上仇恨的目标。这种过度依赖虚假的关联性还导致各种系统与隐含有毒语言作斗争。为了帮助缓解这些问题,我们创建了ToxiGen,这是一个关于13个少数群体的274k有毒和良性声明的大型和机器生成的新数据集。我们开发了一个基于示范的提示框架和一个对抗性分类法,用大规模预先训练的语言模型生成有毒和良性文字。通过这种方式,控制机器的生成使得ToxiGen能够以更大的规模覆盖隐含有毒的文字,并覆盖更多的人口群体。我们对一个具有挑战性的 ToxiGen 组进行了人类评估,发现警告者努力将机器生成的文字与人类书面语言区分开来。我们还发现,94.5%的有毒例子被贴上人类标识师的仇恨言论标签。使用三种公开的数据集,我们展示了我们数据中毒性分类的微调毒性分类方法,可以大大改进我们机器的精确度。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/