Accurate detection of hate speech against politicians, policy making and political ideas is crucial to maintain democracy and free speech. Unfortunately, the amount of labelled data necessary for training models to detect hate speech are limited and domain-dependent. In this paper, we address the issue of classification of hate speech against policy makers from Twitter in Italian, producing the first resource of this type in this language. We collected and annotated 1264 tweets, examined the cases of disagreements between annotators, and performed in-domain and cross-domain hate speech classifications with different features and algorithms. We achieved a performance of ROC AUC 0.83 and analyzed the most predictive attributes, also finding the different language features in the anti-policymakers and anti-immigration domains. Finally, we visualized networks of hashtags to capture the topics used in hateful and normal tweets.
翻译:准确发现针对政治家、决策和政治思想的仇恨言论对于维护民主和言论自由至关重要。不幸的是,培训模式检测仇恨言论所需的贴标签数据数量有限,而且取决于领域。在本文件中,我们从意大利的Twitter上处理针对决策者的仇恨言论分类问题,以这种语言生成这类类型的第一种资源。我们收集了1264条推特并附加了说明,审查了告发者之间的分歧案例,并进行了具有不同特征和算法的内部和跨部仇恨言论分类。我们完成了ROC AC ACU 0.83的表演,分析了最可预见的特点,还发现了反决策者和反移民领域的不同语言特征。最后,我们将标签网络图像化,以捕捉仇恨和正常的推文中使用的话题。