As businesses increasingly rely on social networking sites to engage with their customers, it is crucial to understand and counter reputation manipulation activities, including fraudulently boosting the number of Facebook page likes using like farms. To this end, several fraud detection algorithms have been proposed and some deployed by Facebook that use graph co-clustering to distinguish between genuine likes and those generated by farm-controlled profiles. However, as we show in this paper, these tools do not work well with stealthy farms whose users spread likes over longer timespans and like popular pages, aiming to mimic regular users. We present an empirical analysis of the graph-based detection tools used by Facebook and highlight their shortcomings against more sophisticated farms. Next, we focus on characterizing content generated by social networks accounts on their timelines, as an indicator of genuine versus fake social activity. We analyze a wide range of features extracted from timeline posts, which we group into two main classes: lexical and non-lexical. We postulate and verify that like farm accounts tend to often re-share content, use fewer words and poorer vocabulary, and more often generate duplicate comments and likes compared to normal users. We extract relevant lexical and non-lexical features and and use them to build a classifier to detect like farms accounts, achieving significantly higher accuracy, namely, at least 99% precision and 93% recall.
翻译:由于企业越来越多地依赖社交网络网站与客户接触,因此了解和抵制信誉操纵活动至关重要,包括欺诈性地增加使用类似农场的Facebook网页数量。为此,已经提出若干欺诈检测算法,有些由Facebook部署,使用图表组合,区分真正的相似之处和农场控制的特征。然而,正如我们在本文件中显示的那样,这些工具对用户喜欢在较长的时间跨度和广受欢迎的网页上散布的隐形农场不起作用,目的是模仿普通用户。我们对脸书使用的基于图表的检测工具进行了实证分析,并突出其针对更先进的农场的缺点。我们随后侧重于将社会网络账户生成的内容定性为其时间表,以显示真实的社会活动与虚假的社会活动。我们分析了从时间表文章中提取的多种特征,我们将其分为两大类:词汇和非传统类。我们把这些工具改写和核实一下,像农场账户一样,往往重复内容,使用更少的词汇和更简便的词汇,并比普通用户更经常生成重复的评论和类似的东西。我们把社会网络账户的特征集中化,在正常用户身上进行大幅的分类和排序。我们从39的精确度上提取了一个相关的分类和不精确性特征,从而测量了39的精确性地测量了它们。