The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
翻译:开发社交媒体用户姿态检测和肉眼检测方法,主要依靠大规模和高质量的基准;然而,除了低批注质量之外,现有基准一般还有不完全的用户关系,抑制基于图表的账户检测研究;为解决这些问题,我们建议采用多关系图表推特账户检测基准(MGTAB),这是第一个基于图表的账户检测标准基准(MGTAB),这是第一个基于图表的账户检测标准;据我们所知,MGTAB建基于实地最大原始数据,用户超过155万,推文超过1.3亿;MGTAB包含10 199个专家附加说明的用户和7类关系,确保高质量的批注和多样化关系;在MGTAB中,我们挖掘了20个用户属性,信息收益最大,用户推文是用户的功能;此外,我们对MGTAB和其他公共数据集进行了彻底评估;我们的实验发现,基于图表的方法一般比基于特征的方法更有效,在引入多种关系时表现更好;通过分析实验结果,我们确定了账户检测的有效方法,并提供了这个领域的潜在研究方向。