COLD: 中华进攻性语言探测基准 (COLD: A Benchmark for Chinese Offensive Language Detection)

Offensive language detection and prevention becomes increasing critical for maintaining a healthy social platform and the safe deployment of language models. Despite plentiful researches on toxic and offensive language problem in NLP, existing studies mainly focus on English, while few researches involve Chinese due to the limitation of resources. To facilitate Chinese offensive language detection and model evaluation, we collect COLDataset, a Chinese offensive language dataset containing 37k annotated sentences. With this high-quality dataset, we provide a strong baseline classifier, COLDetector, with 81% accuracy for offensive language detection. Furthermore, we also utilize the proposed \textsc{COLDetector} to study output offensiveness of popular Chinese language models (CDialGPT and CPM). We find that (1) CPM tends to generate more offensive output than CDialGPT, and (2) certain type of prompts, like anti-bias sentences, can trigger offensive outputs more easily.Altogether, our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models. Disclaimer: The paper contains example data that may be considered profane, vulgar, or offensive.

翻译：攻击性语言的探测和预防对维护健康的社会平台和安全部署语言模式变得日益重要。尽管对全国语言方案有毒和攻击性语言问题进行了大量研究,但现有的研究主要侧重于英语,而由于资源有限,很少有研究涉及中文。为了便利中国进攻性语言的探测和模型评估,我们收集了中国进攻性语言数据集COLDataset,这是一个包含37k个附加说明的句子的中国进攻性语言数据集。有了这个高质量的数据集,我们提供了一个强大的基线分类器COL探测器,其中81%的精确度用于攻击性语言的检测。此外,我们还利用拟议的\ textsc{COLSetor}来研究流行中文模式(CDIGPT和CPM)的输出冒犯性。我们发现:(1) 计算机制造的输出往往比CDAIGPT产生更多的冒犯性输出,(2) 某些类型的提示,如反偏见的句子,可以更容易触发攻击性输出。加起来,我们的资源和分析旨在帮助中国在线社区解毒,并评估变异性语言模式的安全性。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/