In recent years, there has been a heightened consensus within academia and in the public discourse that Social Media Platforms (SMPs), amplify the spread of hateful and negative sentiment content. Researchers have identified how hateful content, political propaganda, and targeted messaging contributed to real-world harms including insurrections against democratically elected governments, genocide, and breakdown of social cohesion due to heightened negative discourse towards certain communities in parts of the world. To counter these issues, SMPs have created semi-automated systems that can help identify toxic speech. In this paper we analyse the statistical distribution of hateful and negative sentiment contents within a representative Facebook dataset (n= 604,703) scrapped through 648 public Facebook pages which identify themselves as proponents (and followers) of far-right Hindutva actors. These pages were identified manually using keyword searches on Facebook and on CrowdTangleand classified as far-right Hindutva pages based on page names, page descriptions, and discourses shared on these pages. We employ state-of-the-art, open-source XLM-T multilingual transformer-based language models to perform sentiment and hate speech analysis of the textual contents shared on these pages over a period of 5.5 years. The result shows the statistical distributions of the predicted sentiment and the hate speech labels; top actors, and top page categories. We further discuss the benchmark performances and limitations of these pre-trained language models.
翻译:近些年来,学术界和公开对话中已加强了共识,即社会媒体平台(SMPs)扩大了仇恨情绪和负面情绪内容的传播。研究人员已经查明了仇恨内容、政治宣传和有针对性的信息如何助长真实世界的伤害,包括反对民选政府的叛乱、种族灭绝和社会凝聚力的崩溃,因为对世界部分地区某些社区的负面言论加剧。为了解决这些问题,SMPs创建了半自动系统,帮助识别有毒言论。在本文中,我们分析了具有代表性的Facebook数据集(n=604,703)中仇恨情绪和负面情绪内容的统计分布。通过648个公共脸书页面剪贴了这些页面,这些页面自称是极右印度教行为者的支持者(和追随者)。这些页面被手工使用Facebook和CrowdTrangand 上非常右的印度教网页搜索。为了解决这些问题,SmardTanglands创建了半自动系统,可以帮助识别有毒言论。我们使用了最新的、开源 XLM-T多语言变异语言模式的统计分布模式(nex = 604,703),通过648个公共脸书页页面页面页面页面页面,通过这些语言模型进行情绪的仇恨言论分析。我们对这些仇恨言论的预测分析,并分享了这些图像分析。