We address a challenging problem of identifying main sources of hate speech on Twitter. On one hand, we carefully annotate a large set of tweets for hate speech, and deploy advanced deep learning to produce high quality hate speech classification models. On the other hand, we create retweet networks, detect communities and monitor their evolution through time. This combined approach is applied to three years of Slovenian Twitter data. We report a number of interesting results. Hate speech is dominated by offensive tweets, related to political and ideological issues. The share of unacceptable tweets is moderately increasing with time, from the initial 20% to 30% by the end of 2020. Unacceptable tweets are retweeted significantly more often than acceptable tweets. About 60% of unacceptable tweets are produced by a single right-wing community of only moderate size. Institutional Twitter accounts and media accounts post significantly less unacceptable tweets than individual accounts. However, the main sources of unacceptable tweets are anonymous accounts, and accounts that were suspended or closed during the last three years.
翻译:我们解决了在推特上找到仇恨言论主要来源的棘手问题。 一方面,我们谨慎地为仇恨言论提供大量推特,并运用先进的深层次学习来制作高质量的仇恨言论分类模式。 另一方面,我们创建了网络网络,检测社区并随时监测其演变情况。这种综合方法适用于三年的斯洛文尼亚推特数据。我们报告了一些有趣的结果。仇恨言论以与政治和意识形态问题有关的攻击性推特为主。令人无法接受的推特比例随着时间而适度增加,从最初的20%到2020年底的30 %。不可接受的推特比可接受的推文多得多。大约60%的不可接受的推文是由一个只有中等规模的右翼社区制作的。机构Twitter账户和媒体账户张贴的推文比个人账户少得多。然而,不能接受的推文的主要来源是匿名账户,以及过去三年间暂停或关闭的账户。