WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, spam on WhatsApp is an important issue. Despite this, the distribution of spam via WhatsApp remains understudied by researchers, in part because of the end-to-end encryption offered by the platform. This paper addresses this gap by studying spam on a dataset of 2.6 million messages sent to 5,051 public WhatsApp groups in India over 300 days. First, we characterise spam content shared within public groups and find that nearly 1 in 10 messages is spam. We observe a wide selection of topics ranging from job ads to adult content, and find that spammers post both URLs and phone numbers to promote material. Second, we inspect the nature of spammers themselves. We find that spam is often disseminated by groups of phone numbers, and that spam messages are generally shared for longer duration than non-spam messages. Finally, we devise content and activity based detection algorithms that can counter spam.
翻译:App 是全球超过十亿用户使用的一个广受欢迎的信息应用程序。 由于这个受欢迎度, “WhessApp”上的垃圾邮件是一个重要问题。 尽管如此,通过“WhessApp”传播垃圾邮件的问题仍然没有得到研究人员的研究, 部分原因是平台提供的端对端加密。 本文通过在向印度5 051个公众“WhesApp”团体发送的260万条信息数据集上研究垃圾邮件来解决这一差距。 首先, 我们描述公共团体共享的垃圾邮件内容,发现近十分之一的信息是垃圾邮件。 我们观察了从工作广告到成人内容等广泛选择的话题, 并发现垃圾邮件张贴了URL和电话号码来宣传材料。 其次, 我们检查垃圾邮件本身的性质。 我们发现垃圾邮件通常由一组电话号码传播, 垃圾邮件一般共享的时间比非垃圾邮件信息要长。 最后, 我们设计出可以对抗垃圾邮件的内容和活动检测算法。