Online social network platforms such as Twitter and Sina Weibo have been extremely popular over the past 20 years. Identifying the network community of a social platform is essential to exploring and understanding the users' interests. However, the rapid development of science and technology has generated large amounts of social network data, creating great computational challenges for community detection in large-scale social networks. Here, we propose a novel subsampling spectral clustering algorithm to identify community structures in large-scale social networks with limited computing resources. More precisely, spectral clustering is conducted using only the information of a small subsample of the network nodes, resulting in a huge reduction in computational time. As a result, for large-scale datasets, the method can be realized even using a personal computer. Specifically, we introduce two different sampling techniques, namely simple random subsampling and degree corrected subsampling. The methodology is applied to the dataset collected from Sina Weibo, which is one of the largest Twitter-type social network platforms in China. Our method can very effectively identify the community structure of registered users. This community structure information can be applied to help Sina Weibo promote advertisements to target users and increase user activity.
翻译:过去20年来,Twitter和Sina Weibo等在线社会网络平台非常受欢迎。确定社会平台的网络社区对于探索和理解用户利益至关重要。然而,科学技术的迅速发展产生了大量的社会网络数据,为大规模社交网络中的社区检测带来了巨大的计算挑战。在这里,我们提出一个新的子抽样光谱群算法,以在计算机资源有限的大型社交网络中确定社区结构。更确切地说,光谱群集仅使用网络节点的一个小子样本的信息进行,从而大大缩短计算时间。因此,对于大规模数据集来说,该方法甚至可以实现。具体地说,我们引入两种不同的抽样技术,即简单的随机子抽样和程度校正子抽样。该方法适用于从中国最大的Twitter类型的社交网络平台Sina Weibo收集的数据集。我们的方法可以非常有效地识别注册用户的社区结构。这种社区结构信息可以用来帮助Sina Weibo用户提高目标用户的广告。