We establish the minimax risk for parameter estimation in sparse high-dimensional Gaussian mixture models and show that a constrained maximum likelihood estimator (MLE) achieves the minimax optimality. However, the optimization-based constrained MLE is computationally intractable due to non-convexity of the problem. Therefore, we propose a Bayesian approach to estimate high-dimensional Gaussian mixtures whose cluster centers exhibit sparsity using a continuous spike-and-slab prior, and prove that the posterior contraction rate of the proposed Bayesian method is minimax optimal. The mis-clustering rate is obtained as a by-product using tools from matrix perturbation theory. Computationally, posterior inference of the proposed Bayesian method can be implemented via an efficient Gibbs sampler with data augmentation, circumventing the challenging frequentist nonconvex optimization-based algorithms. The proposed Bayesian sparse Gaussian mixture model does not require pre-specifying the number of clusters, which is allowed to grow with the sample size and can be adaptively estimated via posterior inference. The validity and usefulness of the proposed method is demonstrated through simulation studies and the analysis of a real-world single-cell RNA sequencing dataset.
翻译:我们为稀少高斯高斯混合物模型的参数估计确定了最小值风险,并表明受限最大可能性估测器(MLE)能够实现微缩最大最佳性。然而,由于问题不协调,基于优化的受限MLE在计算上是难以解决的。因此,我们建议采用巴伊西亚方法来估计高位混合物,其集束中心在使用连续的峰值和悬浮法之前表现出广度,并证明拟议的巴伊西亚方法的后端收缩率是最佳的。错误集束率是利用矩阵渗透理论的工具作为副产品取得的。比较而言,拟议巴伊西亚方法的后端推推推法可以通过高效的Gibs采样器和数据增强法加以实施,从而绕过具有挑战性的常态非凝固法优化算法。拟议的巴伊西亚稀释质组混合物模型不需要预先估测组群数,这种组数可以随着样品大小而增长,并且可以通过图像模拟法和单一序列分析方法的可靠度估算。