Hate speech detection models are typically evaluated on held-out test sets. However, this risks painting an incomplete and potentially misleading picture of model performance because of increasingly well-documented systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, recent research has thus introduced functional tests for hate speech detection models. However, these tests currently only exist for English-language content, which means that they cannot support the development of more effective models in other languages spoken by billions across the world. To help address this issue, we introduce Multilingual HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models. MHC covers 34 functionalities across ten languages, which is more languages than any other hate speech dataset. To illustrate MHC's utility, we train and test a high-performing multilingual hate speech detection model, and reveal critical model weaknesses for monolingual and cross-lingual applications.
翻译:然而,由于在仇恨言论数据集中日益有据可查的系统性差距和偏见,这有可能描绘出一种不完整和可能误导的模型性能图。为了能够更有针对性的诊断洞察,最近的研究因此引入了仇恨言论检测模型的功能测试。然而,这些测试目前只针对英语内容,这意味着它们无法支持以世界各地数十亿人所讲的其他语言开发更有效的模型。为了帮助解决这一问题,我们引入了多种语言仇恨检查(MHC),这是一套多语言仇恨言论检测模型的功能测试。MHC覆盖了十种语言的34个功能,这十种语言比任何其他仇恨言论数据集都多。为了说明MHC的效用,我们培训和测试了一个高效的多语言仇恨言论检测模型,并揭示了单语和跨语言应用的关键模型弱点。