部分可观测社会网络中的社区探测 (Community Detection in Partially Observable Social Networks)

from arxiv, 24 pages, 8 figures, 5 tables; to appear in the ACM Transactions on Knowledge Discovery from Data (Please cite our journal version that will appear in an upcoming issue.)

The discovery of community structures in social networks has gained significant attention since it is a fundamental problem in understanding the networks' topology and functions. However, most social network data are collected from partially observable networks with both missing nodes and edges. In this paper, we address a new problem of detecting overlapping community structures in the context of such an incomplete network, where communities in the network are allowed to overlap since nodes belong to multiple communities at once. To solve this problem, we introduce KroMFac, a new framework that conducts community detection via regularized nonnegative matrix factorization (NMF) based on the Kronecker graph model. Specifically, from an inferred Kronecker generative parameter matrix, we first estimate the missing part of the network. As our major contribution to the proposed framework, to improve community detection accuracy, we then characterize and select influential nodes (which tend to have high degrees) by ranking, and add them to the existing graph. Finally, we uncover the community structures by solving the regularized NMF-aided optimization problem in terms of maximizing the likelihood of the underlying graph. Furthermore, adopting normalized mutual information (NMI), we empirically show superiority of our KroMFac approach over two baseline schemes by using both synthetic and real-world networks.

翻译：社会网络中社区结构的发现引起了人们的极大注意,因为社会网络中发现社区结构是理解网络地形和功能的一个根本问题。然而,大多数社会网络数据都是从缺少节点和边缘的可部分观测网络中收集的。在本文件中,我们处理在这样一个不完整的网络中发现重叠社区结构的新问题,因为节点同时属于多个社区,因此网络中的社区可以重叠。为了解决这个问题,我们引入了KroMFac,这是一个新的框架,根据Kronecker图形模型,通过常规化的非负矩阵化因素化(NMF)进行社区检测。具体地说,从一个推断的Kronecker基因参数矩阵中,我们首先估计网络缺失的部分。作为我们对拟议框架的主要贡献,为了提高社区检测准确性,我们然后通过排名来描述和选择有影响力的节点(这些节点往往具有很高的度),然后将其添加到现有的图表中。最后,我们通过解决正规化的NMF的辅助优化优化问题,在最大程度上超越基本图的可能性来发现社区结构。此外,我们采用标准化的相互信息(NMIMI),我们通过合成世界的合成网络的实际优势,通过两个基本基线方案。