Hate speech is a global phenomenon, but most hate speech datasets so far focus on English-language content. This hinders the development of more effective hate speech detection models in hundreds of languages spoken by billions across the world. More data is needed, but annotating hateful content is expensive, time-consuming and potentially harmful to annotators. To mitigate these issues, we explore data-efficient strategies for expanding hate speech detection into under-resourced languages. In a series of experiments with mono- and multilingual models across five non-English languages, we find that 1) a small amount of target-language fine-tuning data is needed to achieve strong performance, 2) the benefits of using more such data decrease exponentially, and 3) initial fine-tuning on readily-available English data can partially substitute target-language data and improve model generalisability. Based on these findings, we formulate actionable recommendations for hate speech detection in low-resource language settings.
翻译:仇恨言论是一种全球现象,但到目前为止,大多数仇恨言论数据集都侧重于英语内容,这阻碍了以全世界数十亿种语言开发出更有效的仇恨言论检测模型。需要更多的数据,但指出仇恨内容的费用昂贵、耗时且对告示员可能有害。为了缓解这些问题,我们探索数据效率高的战略,将仇恨言论检测扩大到资源不足的语言。在对五种非英语语言的单一和多语言模式进行的一系列实验中,我们发现:(1) 需要少量目标语言微调数据才能取得强劲的绩效;(2) 使用更多此类数据的好处急剧减少;(3) 对可轻易获得的英语数据进行初步微调,可以部分替代目标语言数据,改进模式的可概括性。基于这些发现,我们为在低资源语言环境中的仇恨言论检测制定了可操作的建议。