NoisyHate: 在线基于人工扰动的内容 moderation 机器学习模型基准测试 (NoisyHate: Benchmarking Content Moderation Machine Learning Models with Human-Written Perturbations Online)

Online texts with toxic content are a threat in social media that might cause cyber harassment. Although many platforms applied measures, such as machine learning-based hate-speech detection systems, to diminish their effect, those toxic content publishers can still evade the system by modifying the spelling of toxic words. Those modified words are also known as human-written text perturbations. Many research works developed certain techniques to generate adversarial samples to help the machine learning models obtain the ability to recognize those perturbations. However, there is still a gap between those machine-generated perturbations and human-written perturbations. In this paper, we introduce a benchmark test set containing human-written perturbations online for toxic speech detection models. We also recruited a group of workers to evaluate the quality of this test set and dropped low-quality samples. Meanwhile, to check if our perturbation can be normalized to its clean version, we applied spell corrector algorithms on this dataset. Finally, we test this data on state-of-the-art language models, such as BERT and RoBERTa, and black box APIs, such as perspective API, to demonstrate the adversarial attack with real human-written perturbations is still effective.

翻译：摘要：具有有害内容的在线文本是社交媒体中的威胁，可能会导致网络骚扰。虽然许多平台采取了措施，如基于机器学习的仇恨言论检测系统，以减少其影响，但那些有害内容发布者仍可以通过修改有毒词汇的拼写来规避系统。这些修改后的单词也称为人工编写的文本扰动。许多研究工作开发了某些技术来生成对抗样本，以帮助机器学习模型获得识别这些扰动的能力。然而，人工编写的扰动与机器生成的扰动之间仍存在差距。在本文中，我们介绍了一个包含在线人工编写扰动的基准测试集，用于测试有害言论检测模型。我们还招募了一组工人来评估此测试集的质量并删除质量低的样本。同时，为了检查我们的扰动是否可以被规范化为其原始版本，我们在该数据集上应用了拼写校正算法。最后，我们将此数据集测试于最先进的语言模型（如 BERT 和 RoBERTa）和黑盒 API（如 perspective API），以证明使用真正的人工编写的扰动进行对抗攻击仍然有效。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/