This paper investigates fundamental limits of exact recovery in the general d-uniform hypergraph stochastic block model (d-HSBM), wherein n nodes are partitioned into k disjoint communities with relative sizes (p1,..., pk). Each subset of nodes with cardinality d is generated independently as an order-d hyperedge with a certain probability that depends on the ground-truth communities that the d nodes belong to. The goal is to exactly recover the k hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and d-HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.
翻译:本文调查了一般的d- unifyth higgrophy 随机区块模型(d- HSBM)中准确恢复的基本限制,该模型将正节区分割成相对大小(p1,...,pk)的k 脱节社区(p1,...,pk)。 具有基点的每个节点子都是独立生成的,是一种有秩序的顶点,其某种可能性取决于d节点所属的地面真相社区。 目标是根据观察到的高分完全恢复 k 隐蔽社区。 我们显示,存在一个尖锐的阈值,这样精确的回收可以超过临界值,而且不可能低于临界值(除将精确指定的小型参数制度外) 。 这个阈值代表了我们称为大切诺夫- 赫林 社区之间普遍差异的数量。 我们这一总模式的结果是恢复标准 SBMM 和 d- HSBMM 标准社区之前的结果, 有两个对称社区为特殊案例。 在证明我们的可实现结果的路径上, 我们开发了一个超时的双级双级双级共算算法,, 符合第二个临界点 。 第一阶段不采用某种超级的精确的光谱分组,, 直至每个阶段采用某种精确的回收方法。