Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media. In this paper, we present fBERT, a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over $1.4$ million offensive instances. We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.
翻译:BERT、XLNET和XLM-R等以变异器为基础的模型,在各种NLP任务中取得了最先进的表现,包括确定攻击性语言和仇恨言论,这是社交媒体中的一个重要问题。在本文中,我们介绍了FBERT,一个在SOLID上重新培训的BERT模型,SOLID是现有最大的英语攻击性语言识别系统,有超过140万美元的进攻性案例。我们评估了FBERT在确定多个英国数据集的冒犯性内容方面的表现,我们测试了从SOLID中挑选案例的几个门槛。FBERT模式将免费提供给社区。