Social media is a modern person's digital voice to project and engage with new ideas and mobilise communities $\unicode{x2013}$ a power shared with extremists. Given the societal risks of unvetted content-moderating algorithms for Extremism, Radicalisation, and Hate speech (ERH) detection, responsible software engineering must understand the who, what, when, where, and why such models are necessary to protect user safety and free expression. Hence, we propose and examine the unique research field of ERH context mining to unify disjoint studies. Specifically, we evaluate the start-to-finish design process from socio-technical definition-building and dataset collection strategies to technical algorithm design and performance. Our 2015-2021 51-study Systematic Literature Review (SLR) provides the first cross-examination of textual, network, and visual approaches to detecting extremist affiliation, hateful content, and radicalisation towards groups and movements. We identify consensus-driven ERH definitions and propose solutions to existing ideological and geographic biases, particularly due to the lack of research in Oceania/Australasia. Our hybridised investigation on Natural Language Processing, Community Detection, and visual-text models demonstrates the dominating performance of textual transformer-based algorithms. We conclude with vital recommendations for ERH context mining researchers and propose an uptake roadmap with guidelines for researchers, industries, and governments to enable a safer cyberspace.
翻译:社会媒体是现代人的数字声音,用来预测和接触新思想和动员社区,用新思想和动员新思想($\unicode{x2013}$x2013}美元,这是极端分子拥有的一种权力。鉴于对极端主义、激进化和仇恨言论(ERH)的检测,负责任的软件工程必须了解谁、何时、何时、何地以及为什么这些模式对于保护用户安全和自由表达是必要的。因此,我们提议并审查ERH开采环境的独特研究领域,以统一脱节研究。具体地说,我们从社会技术技术定义和数据集收集战略到技术算法设计和业绩,从开始到完成的设计过程。我们2015-2021年研究系统文学评论(SLR)首次对教科书、网络和视觉方法进行交叉审查,以发现极端主义的关联、仇恨内容和对团体和运动的激进化。我们提出了共识驱动的ERH定义,并提出了现有意识形态和地理偏见的解决办法,特别是由于大洋洲/Astrasia缺乏研究。我们关于自然语言处理、社区检测和数据收集的混合调查过程,我们提出了一套语言处理、社区调查、视觉研究模型,我们提出了一种用于进行可持续性研究的系统化的版本。