Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos through a machine learning model fine-tuned on a large set of hand-annotated data. Our analysis shows that there is no evidence of the presence of "serial haters", intended as active users posting exclusively hateful comments. Moreover, coherently with the echo chamber hypothesis, we find that users skewed towards one of the two categories of video channels (questionable, reliable) are more prone to use inappropriate, violent, or hateful language within their opponents community. Interestingly, users loyal to reliable sources use on average a more toxic language than their counterpart. Finally, we find that the overall toxicity of the discussion increases with its length, measured both in terms of number of comments and time. Our results show that, coherently with Godwin's law, online debates tend to degenerate towards increasingly toxic exchanges of views.
翻译:网上辩论往往以用户之间的极端两极分化和激烈讨论为特征。 网上仇恨言论的存在正变得越来越成问题, 使得有必要制定适当的对策。 在这项工作中, 我们通过机器学习模型对大量手语附加说明的数据进行微调, 对YouTube视频上超过100万次的评论进行仇恨言论检测。 我们的分析显示,没有证据表明存在“ 空中仇恨者”, 目的是作为纯粹发表令人憎恨的评论的积极用户。 此外, 与回声室假设一致的是, 我们发现用户偏向于两类视频频道之一(可争议、可靠), 更倾向于在他们的反对者群体中使用不适当、暴力或仇恨的语言。 有趣的是, 忠实于可靠来源的用户平均使用比其对手更有毒的语言。 最后, 我们发现,讨论的总体毒性随其长度而增加, 以评论数量和时间来衡量。 我们的结果表明, 与 Godwin 法律一致的是, 在线辩论倾向于恶化到越来越有毒的观点交流。