Large-scale network data can pose computational challenges, be expensive to acquire, and compromise the privacy of individuals in social networks. We show that the locations and scales of latent space cluster models can be inferred from the number of connections between groups alone. We demonstrate this modelling approach using synthetic data and apply it to friendships between students collected as part of the Add Health study, eliminating the need for node-level connection data. The method thus protects the privacy of individuals and simplifies data sharing. It also offers performance advantages over node-level latent space models because the computational cost scales with the number of clusters rather than the number of nodes.
翻译:大型网络数据可能带来计算上的挑战,获取成本昂贵,并损害社交网络中个人隐私。我们表明,潜在空间集群模型的位置和规模可以仅仅从群体间连接的数量中推断出来。我们用合成数据来证明这种建模方法,并将其应用于作为“添加健康”研究的一部分收集的学生之间的友谊,从而消除了对节点连接数据的需求。这种方法因此保护个人隐私,简化了数据共享。这种方法还比节点潜伏空间模型具有性能优势,因为计算成本尺度包括集群数量而不是节点数量。</s>