The stochastic block model (SBM) is widely studied as a benchmark for graph clustering aka community detection. In practice, graph data often come with node attributes that bear additional information about the communities. Previous works modeled such data by considering that the node attributes are generated from the node community memberships. In this work, motivated by a recent surge of works in signal processing using deep neural networks as priors, we propose to model the communities as being determined by the node attributes rather than the opposite. We define the corresponding model; we call it the neural-prior SBM. We propose an algorithm, stemming from statistical physics, based on a combination of belief propagation and approximate message passing. We analyze the performance of the algorithm as well as the Bayes-optimal performance. We identify detectability and exact recovery phase transitions, as well as an algorithmically hard region. The proposed model and algorithm can be used as a benchmark for both theory and algorithms. To illustrate this, we compare the optimal performances to the performance of simple graph neural networks.
翻译:Translated abstract:
随机块模型(SBM)被广泛研究作为图聚类(即社区检测)的基准。在实践中,图数据通常带有包含关于社区的附加信息的节点属性。以往的研究模拟这种数据,通过考虑节点属性由节点社区成员生成来处理。在本文中,受到使用深度神经网络作为先验的信号处理领域的最近增长的启发,我们建议把社区定义为由节点属性决定,而不是反过来。我们定义了相应的模型,称之为神经先验SBM。我们提出了一种基于统计物理的算法,结合了信念传播和近似消息传递。我们分析了算法的性能以及贝叶斯最优性能。我们确定了检测和精确恢复相变的阶段,以及算法困难区域。所提出的模型和算法可以用作理论和算法的基准。为了说明这一点,我们将最优性能与简单的图神经网络的性能进行了比较。