The performance of hate speech detection models relies on the datasets on which the models are trained. Existing datasets are mostly prepared with a limited number of instances or hate domains that define hate topics. This hinders large-scale analysis and transfer learning with respect to hate domains. In this study, we construct large-scale tweet datasets for hate speech detection in English and a low-resource language, Turkish, consisting of human-labeled 100k tweets per each. Our datasets are designed to have equal number of tweets distributed over five domains. The experimental results supported by statistical tests show that Transformer-based language models outperform conventional bag-of-words and neural models by at least 5% in English and 10% in Turkish for large-scale hate speech detection. The performance is also scalable to different training sizes, such that 98% of performance in English, and 97% in Turkish, are recovered when 20% of training instances are used. We further examine the generalization ability of cross-domain transfer among hate domains. We show that 96% of the performance of a target domain in average is recovered by other domains for English, and 92% for Turkish. Gender and religion are more successful to generalize to other domains, while sports fail most.
翻译:仇恨言论检测模型的性能取决于模型所培训的数据集。现有的数据集大多是用数量有限的事件或仇恨领域来制作的,用来定义仇恨主题。这阻碍了在仇恨领域进行大规模分析和转移学习。在本研究中,我们用英语和低资源语言,分别用人类标签为100k Twitter进行大规模推文检测,土耳其语由人类标签为每100k Twitter组成。我们的数据集的设计是,在五个领域分布同等数量的推文。通过统计测试支持的实验结果显示,基于变换器的语言模型在传统的言词包和神经模型上的表现至少超过5%的英语和10%的土耳其语,用于大规模仇恨言论检测。在使用20%的培训实例时,还可推广到不同的培训规模,即98%的英语表现和97%的土耳其语表现。我们进一步检查了在仇恨领域之间跨场转移的普及能力。我们发现,基于变换器的语言模型的平均性能有96%超过常规词包和神经模型,至少5%的英语和10%的土耳其语模型表现为5%,而在其他领域,92%的体育领域则更成功。