This paper investigates the computational and statistical limits in clustering matrix-valued observations. We propose a low-rank mixture model (LrMM), adapted from the classical Gaussian mixture model (GMM) to treat matrix-valued observations, which assumes low-rankness for population center matrices. A computationally efficient clustering method is designed by integrating Lloyd algorithm and low-rank approximation. Once well-initialized, the algorithm converges fast and achieves an exponential-type clustering error rate that is minimax optimal. Meanwhile, we show that a tensor-based spectral method delivers a good initial clustering. Comparable to GMM, the minimax optimal clustering error rate is decided by the separation strength, i.e, the minimal distance between population center matrices. By exploiting low-rankness, the proposed algorithm is blessed with a weaker requirement on separation strength. Unlike GMM, however, the statistical and computational difficulty of LrMM is characterized by the signal strength, i.e, the smallest non-zero singular values of population center matrices. Evidences are provided showing that no polynomial-time algorithm is consistent if the signal strength is not strong enough, even though the separation strength is strong. The performance of our low-rank Lloyd algorithm is further demonstrated under sub-Gaussian noise. Intriguing differences between estimation and clustering under LrMM are discussed. The merits of low-rank Lloyd algorithm are confirmed by comprehensive simulation experiments. Finally, our method outperforms others in the literature on real-world datasets.
翻译:本文调查了组群矩阵估值观测的计算和统计限制。 我们建议采用从古典高斯混合模型(GMM)改制的低级别混合模型(LrMM)来处理基团估值的观察,该模型假定人口中心矩阵的级别低。 计算高效的组群方法的设计是结合劳埃德算法和低级别近似法。 一旦经过周密的调整,算法会迅速汇合,并达到最优的指数型群集错误率。 同时, 我们显示, 以高压为基础的光谱方法提供了良好的初始群集。 与GMM可比较, 小型最佳组群集错误率是由分离强度决定的, 即人口中心矩阵之间的最小距离。 通过利用低级别算法, 拟议的算法的优点是分离能力较弱。 然而, 与GMMLM的统计和计算困难的特征是信号性强, 即人口中心矩阵中最小的非零级单值。 提供的证据表明, 与GMLML的精确度值相比,如果信号性精度的精度值是更强的缩缩缩缩缩缩的算法,那么,那么在LILMLML的精确的精度的精度下,那么,则其精确度的精确度的精确度的精确度的精确度的精确度是更深的精确度是更强的精确度也是更深。