We consider the problem of multiway clustering in the presence of unknown degree heterogeneity. Such data problems arise commonly in applications such as recommendation system, neuroimaging, community detection, and hypergraph partitions in social networks. The allowance of degree heterogeneity provides great flexibility in clustering models, but the extra complexity poses significant challenges in both statistics and computation. Here, we develop a degree-corrected tensor block model with estimation accuracy guarantees. We present the phase transition of clustering performance based on the notion of angle separability, and we characterize three signal-to-noise regimes corresponding to different statistical-computational behaviors. In particular, we demonstrate that an intrinsic statistical-to-computational gap emerges only for tensors of order three or greater. Further, we develop an efficient polynomial-time algorithm that provably achieves exact clustering under mild signal conditions. The efficacy of our procedure is demonstrated through two data applications, one on human brain connectome project, and another on Peru Legislation network dataset.
翻译:这些数据问题通常出现在社会网络中的推荐系统、神经成像、社区检测和高光谱分割等应用中。 允许程度差异性为组合模型提供了极大的灵活性,但额外的复杂性在统计和计算两方面都提出了重大挑战。 我们在这里开发了一种有估计精确度保障的经度校正的慢时区块模型。 我们展示了基于角度分离概念的集群性能的阶段过渡,我们描述的是与不同统计-计算行为相对应的三个信号到噪音系统。特别是,我们证明只有三级或更强的电压才会出现内在的统计到合成差距。此外,我们开发了一种高效的多时算法,在温和信号条件下实现精确组合。我们程序的效率通过两个数据应用程序得到证明,一个是人类大脑连接项目,另一个是秘鲁立法网络数据集。