People today typically use multiple online social networks (Facebook, Twitter, Google+, LinkedIn, etc.). Each online network represents a subset of their "real" ego-networks. An interesting and challenging problem is to reconcile these online networks, that is, to identify all the accounts belonging to the same individual. Besides providing a richer understanding of social dynamics, the problem has a number of practical applications. At first sight, this problem appears algorithmically challenging. Fortunately, a small fraction of individuals explicitly link their accounts across multiple networks; our work leverages these connections to identify a very large fraction of the network. Our main contributions are to mathematically formalize the problem for the first time, and to design a simple, local, and efficient parallel algorithm to solve it. We are able to prove strong theoretical guarantees on the algorithm's performance on well-established network models (Random Graphs, Preferential Attachment). We also experimentally confirm the effectiveness of the algorithm on synthetic and real social network data sets.
翻译:如今,人们通常使用多个在线社交网络(Facebook、Twitter、Google+、LinkedIn等)。每个在线网络代表着他们“真实”的自我网络的子集。一个有趣而具有挑战性的问题是调和这些在线网络,即识别属于同一个人的所有账户。这个问题除了提供对社会动态的更丰富了解外,还具有若干实际应用。乍看起来,这个问题似乎具有逻辑上的挑战性。幸运的是,一小部分个人在多个网络之间明确连接他们的账户;我们的工作利用这些连接来识别网络的非常大的一部分。我们的主要贡献是在数学上首次将问题正式化,并设计一个简单的、本地的和高效的平行算法来解决这个问题。我们能够证明在完善的网络模型(兰多图、优惠附件)上对算法的性表现有很强的理论保障。我们还实验性地确认合成和真实的社会网络数据集的算法的有效性。