Due to the sheer volume of online hate, the AI and NLP communities have started building models to detect such hateful content. Recently, multilingual hate is a major emerging challenge for automated detection where code-mixing or more than one language have been used for conversation in social media. Typically, hate speech detection models are evaluated by measuring their performance on the held-out test data using metrics such as accuracy and F1-score. While these metrics are useful, it becomes difficult to identify using them where the model is failing, and how to resolve it. To enable more targeted diagnostic insights of such multilingual hate speech models, we introduce a set of functionalities for the purpose of evaluation. We have been inspired to design this kind of functionalities based on real-world conversation on social media. Considering Hindi as a base language, we craft test cases for each functionality. We name our evaluation dataset HateCheckHIn. To illustrate the utility of these functionalities , we test state-of-the-art transformer based m-BERT model and the Perspective API.
翻译:由于网上仇恨的数量庞大,AI和NLP社区开始建立模型来检测这种仇恨内容。最近,多语言仇恨是自动检测的一大新挑战,因为社交媒体使用混合代码或一种以上语言进行交谈。通常,仇恨言论检测模型是通过使用准确性和F1分数等指标测量其性能来评估的。虽然这些指标有用,但很难用它们来识别模型的失败之处以及如何解决。为了能够更有针对性的诊断性地发现这种多语言仇恨言论模型,我们为评估目的引入了一套功能。我们受启发设计了基于社交媒体上真实世界对话的这种功能。把印地语作为基础语言,我们为每个功能设计测试案例。我们命名我们的评估数据集HateCheckHIn。为了说明这些功能的实用性,我们测试了基于M-BERT的高级变压器模型和观点API。