High-order clustering aims to identify heterogeneous substructures in multiway datasets that arise commonly in neuroimaging, genomics, social network studies, etc. The non-convex and discontinuous nature of this problem pose significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, \emph{high-order Lloyd algorithm} (HLloyd), and \emph{high-order spectral clustering} (HSC), for high-order clustering. The convergence guarantees and statistical optimality are established for the proposed procedure under a mild sub-Gaussian noise assumption. Under the Gaussian tensor block model, we completely characterize the statistical-computational trade-off for achieving high-order exact clustering based on three different signal-to-noise ratio regimes. The analysis relies on new techniques of high-order spectral perturbation analysis and a "singular-value-gap-free" error bound in tensor estimation, which are substantially different from the matrix spectral analyses in the literature. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.
翻译:高阶群集旨在确定在神经成像、基因组学、社会网络研究等多路数据集中常见的、在神经成像、基因组学、社会网络研究中常见的不同子结构。 这一问题的非混凝土和不连续性质在统计和计算两方面都构成重大挑战。 在本文中,我们提出一个高压区块模型和计算效率高的方法, \emph{高阶劳埃德算法}(Hloyd), 和 emph{高阶光谱集成} (HSC), 用于高阶群集。 在轻度的亚-加西噪音假设下,为拟议的程序确立了趋同保证和统计最佳性。在高山高山高原区块模型下,我们根据三种不同的信号-噪音比比制度, 将实现高序精确集成的统计- 计算交换交换结果完全定性为统计- 。 分析依靠高阶光谱光谱突扰分析的新技术和高压估计中“ 单值-无” 错误的新技术, 。 与文献中的矩阵光谱分析有很大不同。最后,我们通过广泛的合成试验展示了拟议程序的好处。