To unravel the driving patterns of networks, the most popular models rely on community detection algorithms. However, these approaches are generally unable to reproduce the structural features of the network. Therefore, attempts are always made to develop models that incorporate these network properties beside the community structure. In this work, we present a probabilistic generative model and an efficient algorithm to both perform community detection and capture reciprocity in networks. Our approach jointly models pairs of edges with exact 2-edge joint distributions. In addition, it provides closed-form analytical expressions for both marginal and conditional distributions. We validate our model on synthetic data in recovering communities, edge prediction tasks, and generating synthetic networks that replicate the reciprocity values observed in real networks. We also highlight these findings on two real datasets that are relevant for social scientists and behavioral ecologists. Our method overcomes the limitations of both standard algorithms and recent models that incorporate reciprocity through a pseudo-likelihood approximation. The inference of the model parameters is implemented by the efficient and scalable expectation-maximization algorithm, as it exploits the sparsity of the dataset. We provide an open-source implementation of the code online.
翻译:为了打破网络的驱动模式,最受欢迎的模型依赖于社区检测算法。然而,这些方法一般无法复制网络的结构特征。因此,总是试图开发模型,在社区结构之外纳入这些网络属性。在这项工作中,我们提出了一个概率化基因模型和高效算法,既可以进行社区检测,也可以在网络中捕捉对等。我们的方法是结合精确的两端联合分布的边缘。此外,它为边际和有条件分布提供了封闭式分析表达法。我们验证了我们的合成数据模型,用于恢复社区、边缘预测任务和生成合成网络,以复制在实际网络中观测到的对等价值。我们还着重介绍了两个真实数据集的这些结果,这两个数据集与社会科学家和行为学家有关。我们的方法克服了标准算法和最近模型的局限性,这些模型通过假似近似近似近似近似的近似法进行对等对等。模型参数的推论是通过高效和可扩缩的预期-最大化算法加以实施,因为它利用数据集的宽度。我们提供了在线代码的开源执行。