ToxiGen:用于检测反言语和隐含仇恨言论的大型机器生成数据集 (ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection)

Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle with detecting implicitly toxic language. To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset of 274k toxic and benign statements about 13 minority groups. We develop a demonstration-based prompting framework and an adversarial classifier-in-the-loop decoding method to generate subtly toxic and benign text with a massive pretrained language model. Controlling machine generation in this way allows ToxiGen to cover implicitly toxic text at a larger scale, and about more demographic groups, than previous resources of human-written text. We conduct a human evaluation on a challenging subset of ToxiGen and find that annotators struggle to distinguish machine-generated text from human-written language. We also find that 94.5% of toxic examples are labeled as hate speech by human annotators. Using three publicly-available datasets, we show that finetuning a toxicity classifier on our data improves its performance on human-written data substantially. We also demonstrate that ToxiGen can be used to fight machine-generated toxicity as finetuning improves the classifier significantly on our evaluation subset. Our code and data can be found at https://github.com/microsoft/ToxiGen.

翻译：含有少数群体的有毒语言检测系统往往错误地标出含有少数群体的毒性,因为这些群体往往是网上仇恨的目标。这种过度依赖虚假的关联性还导致系统在检测隐含有毒语言方面挣扎。为了帮助缓解这些问题,我们创建了托西根(ToxiGen),这是一个关于13个少数群体的274k有毒和良性声明的大型和机器生成的新数据集。我们开发了一个基于演示的提示框架和一个对抗性分类器(loop解码)解码方法,以产生具有大规模预先训练语言模型的低毒性和良性文本。通过这种方式的机器生成,使托西根(ToxiGen)能够以更大的规模覆盖隐含有毒文本,并覆盖更多的人口群体。我们对托西根(ToxiGen)的一组挑战性数据进行了人类评估,发现警告器试图将机器生成的文本与人类书面语言区别开来。我们还发现,94.5%的毒性实例被贴上名为“憎恶性言论”。使用三种公开的数据集,我们展示了对数据分类的毒性分类的微调,也显示我们用于机器的G的精确度数据。我们用来改进了我们的机器税化数据。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日