We investigate the unsupervised node classification problem on random hypergraphs under the non-uniform Hypergraph Stochastic Block Model (HSBM) with two equal-sized communities. In this model, edges appear independently with probabilities depending only on the labels of their vertices. We identify the threshold for strong consistency, expressed in terms of the Generalized Hellinger distance. Below this threshold, strong consistency is impossible, and we derive the Information-Theoretic (IT) lower bound on the expected mismatch ratio. Above the threshold, the parameter space is typically divided into two disjoint regions. When only the aggregated adjacency matrices are accessible, while one-stage algorithms accomplish strong consistency with high probability in the region far from the threshold, they fail in the region closer to the threshold. We propose a new refinement algorithm which, in conjunction with the initial estimation, provably achieves strong consistency throughout the entire region above the threshold, and attains the IT lower bound when below the threshold, proving its optimality. This novel refinement algorithm applies the power iteration method to a weighted adjacency matrix, where the weights are determined by hyperedge sizes and the initial label estimate. Unlike the constant degree regime where a subset selection of uniform layers is necessary to enhance clustering accuracy, in the scenario with diverging degrees, each uniform layer contributes non-negatively to clustering accuracy. Therefore, aggregating information across all uniform layers yields better performance than using any single layer alone.


翻译:本文研究非均匀超图随机块模型下随机超图的无监督节点分类问题,该模型假设两个规模相等的社区。在此模型中,超边的出现概率仅取决于其顶点的标签,且各超边独立生成。我们以广义Hellinger距离的形式确定了强一致性的阈值:低于该阈值时强一致性无法实现,并推导出期望错配率的信息论下界;高于该阈值时,参数空间通常可划分为两个不相交区域。当仅能获取聚合邻接矩阵时,单阶段算法在远离阈值的区域能以高概率实现强一致性,但在接近阈值的区域会失效。我们提出一种新的精化算法,该算法结合初始估计,可证明在整个阈值以上区域实现强一致性,并在阈值以下区域达到信息论下界,从而证明其最优性。该精化算法将幂迭代方法应用于加权邻接矩阵,其权重由超边规模和初始标签估计决定。与常数度情形下需通过均匀层子集选择来提升聚类精度不同,在发散度场景中,每个均匀层对聚类精度均产生非负贡献。因此,聚合所有均匀层的信息比使用单一层能获得更优性能。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员