Social media have been deliberately used for malicious purposes, including political manipulation and disinformation. Most research focuses on high-resource languages. However, malicious actors share content across countries and languages, including low-resource ones. Here, we investigate whether and to what extent malicious actors can be detected in low-resource language settings. We discovered that a high number of accounts posting in Tagalog were suspended as part of Twitter's crackdown on interference operations after the 2016 US Presidential election. By combining text embedding and transfer learning, our framework can detect, with promising accuracy, malicious users posting in Tagalog without any prior knowledge or training on malicious content in that language. We first learn an embedding model for each language, namely a high-resource language (English) and a low-resource one (Tagalog), independently. Then, we learn a mapping between the two latent spaces to transfer the detection model. We demonstrate that the proposed approach significantly outperforms state-of-the-art models, including BERT, and yields marked advantages in settings with very limited training data-the norm when dealing with detecting malicious activity in online platforms.
翻译:社会媒体被故意用于恶意目的,包括政治操纵和虚假信息。 大多数研究都侧重于高资源语言。 但是,恶意行为者在不同国家和语言中共享内容,包括低资源语言。 在这里,我们调查在低资源语言环境中是否以及在何种程度上可以检测到恶意行为者。 我们发现,在2016年美国总统大选后,Tagalog 上的大量账户被中止,作为Twitter打击干扰行动的一部分。 通过将文字嵌入和传输学习结合起来,我们的框架可以预见到恶意用户在Tagalog上发布,没有事先任何知识或关于恶意内容的培训。 我们首先学习了每种语言的嵌入模式,即高资源语言(英语)和低资源语言(Tagalog ) 。 然后,我们学到了两个潜在空间之间的地图,以传输检测模式。 我们证明,拟议的方法大大超越了包括BERT在内的最新技术模式,并在处理网上平台恶意活动时,在培训数据规范非常有限的环境中产生显著的优势。