The increasing prevalence of network data in a vast variety of fields and the need to extract useful information out of them have spurred fast developments in related models and algorithms. Among the various learning tasks with network data, community detection, the discovery of node clusters or "communities," has arguably received the most attention in the scientific community. In many real-world applications, the network data often come with additional information in the form of node or edge covariates that should ideally be leveraged for inference. In this paper, we add to a limited literature on community detection for networks with covariates by proposing a Bayesian stochastic block model with a covariate-dependent random partition prior. Under our prior, the covariates are explicitly expressed in specifying the prior distribution on the cluster membership. Our model has the flexibility of modeling uncertainties of all the parameter estimates including the community membership. Importantly, and unlike the majority of existing methods, our model has the ability to learn the number of the communities via posterior inference without having to assume it to be known. Our model can be applied to community detection in both dense and sparse networks, with both categorical and continuous covariates, and our MCMC algorithm is very efficient with good mixing properties. We demonstrate the superior performance of our model over existing models in a comprehensive simulation study and an application to two real datasets.
翻译:----
随着网络数据在各个领域中的日益普及以及从中提取有用信息的需求,在相关模型和算法方面得到了快速发展。在众多网络数据的学习任务中,社群检测,即发现节点聚类或“社群”,在科学界中已经受到了广泛关注。在许多实际应用中,网络数据通常伴随着节点或边缘协变量等附加信息,这些信息理论上应该被利用。本文在协变量网络社群检测方面对已有方法进行了完善,提出了一种具有协变量依赖随机分区先验的贝叶斯随机块模型。在我们的先验下,通过指定集群成员身份的先验分布明确表达了协变量。我们的模型具有建模所有参数估计的不确定性(包括社群成员身份)的灵活性。重要的是,并且与大多数现有方法不同,我们的模型具有在后验推断中学习社群数量而无需假设社群数量已知的能力。我们的模型可应用于密集和稀疏网络的社群检测,包括分类和连续协变量,并且我们的MCMC算法非常高效且具有良好的混合性质。通过全面的模拟研究和两个真实数据集的应用,我们展示了我们的模型优于现有模型的卓越性能。