Real-world networks often come with side information that can help to improve the performance of network analysis tasks such as clustering. Despite a large number of empirical and theoretical studies conducted on network clustering methods during the past decade, the added value of side information and the methods used to incorporate it optimally in clustering algorithms are relatively less understood. We propose a new iterative algorithm to cluster networks with side information for nodes (in the form of covariates) and show that our algorithm is optimal under the Contextual Symmetric Stochastic Block Model. Our algorithm can be applied to general Contextual Stochastic Block Models and avoids hyperparameter tuning in contrast to previously proposed methods. We confirm our theoretical results on synthetic data experiments where our algorithm significantly outperforms other methods, and show that it can also be applied to signed graphs. Finally we demonstrate the practical interest of our method on real data.
翻译:实际世界网络往往带有有助于改进诸如集群等网络分析任务绩效的侧面信息。尽管在过去十年里对网络集群方法进行了大量的经验和理论研究,但相对而言,对侧面信息及其最佳纳入组合算法的方法的附加值理解较少。我们建议为带有节点侧面信息的分组网络(以共变形式)提供一个新的迭代算法,并表明在背景对数碎块模型下我们的算法是最佳的。我们的算法可以适用于一般的“环境斯托克区块模型”并避免与先前建议的方法形成对比的超参数调整。我们确认我们在合成数据实验方面的理论结果,我们的算法大大优于其他方法,并表明它也可以用于签名的图表。最后,我们展示了我们方法对真实数据的实际兴趣。