Content moderation is the process of screening and monitoring user-generated content online. It plays a crucial role in stopping content resulting from unacceptable behaviors such as hate speech, harassment, violence against specific groups, terrorism, racism, xenophobia, homophobia, or misogyny, to mention some few, in Online Social Platforms. These platforms make use of a plethora of tools to detect and manage malicious information; however, malicious actors also improve their skills, developing strategies to surpass these barriers and continuing to spread misleading information. Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. In response to this recent ongoing issue, this paper presents an innovative approach to address this linguistic trend in social networks through the simulation of different content evasion techniques and a multilingual Transformer model for content evasion detection. In this way, we share with the rest of the scientific community a multilingual public tool, named "pyleetspeak" to generate/simulate in a customizable way the phenomenon of content evasion through automatic word camouflage and a multilingual Named-Entity Recognition (NER) Transformer-based model tuned for its recognition and detection. The multilingual NER model is evaluated in different textual scenarios, detecting different types and mixtures of camouflage techniques, achieving an overall weighted F1 score of 0.8795. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content on social networks, making the fight against information disorders more effective.
翻译:在网上社会平台中,这些平台利用大量工具来检测和管理恶意信息;然而,恶意行为者还提高了技能,制定了克服这些障碍的战略,并继续传播误导信息;使用和拼贴关键字是躲避平台内容调控系统最常用的技术之一;针对最近出现的问题,本文提出了一种创新办法,通过模拟不同内容规避技术和多语言的规避内容模式,以及一个多语言的变异模型,用以在社交网络中应对这种语言趋势,以模拟不同内容规避技术,并采用多语言的发现模式;我们以这种方式与科学界分享一个多语言的公共工具,称为“Pyleetespeak”,以便以可定制的方式生成/模拟内容规避现象,通过自动的文字迷彩迷彩和多语言的识别(NER)基于变异的模型,以调整其识别和检测。