In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have collected and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the abuse-inclined version obtained by retraining with posts from the banned communities on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the generic pre-trained language model and its corresponding abusive language-inclined counterpart across the datasets, indicating that portability is affected by compatibility of the annotated phenomena.
翻译:在本文中,我们引入了HateBERT,这是英语滥用语言检测的再培训的BERT模式,该模式在RAL-E方面接受了培训,这是一个大型的Reddit评论数据集,该数据集来自我们收集并提供给公众的因攻击性、虐待性或仇恨性而被禁止的社区。我们介绍了对一般的预先培训语言模式和通过从被禁止社区获得的关于攻击性、滥用性语言和仇恨言论检测任务的三种英国数据集的再培训获得的滥用性版本进行详细比较的结果。在所有数据集中,HateBERT比对应的通用BERT模式更完善。我们还讨论了一系列实验,将通用的事先培训语言模型及其相应的滥用性语言嵌入式对应方的可移动性在数据集之间进行比较,表明可移动性受到附加说明的现象兼容性的影响。