We introduce a new approach for estimating the number of spikes in a general class of spiked covariance models without directly computing the eigenvalues of the sample covariance matrix. This approach is based on the Lanczos algorithm and the asymptotic properties of the associated Jacobi matrix and its Cholesky factorization. A key aspect of the analysis is interpreting the eigenvector spectral distribution as a perturbation of its asymptotic counterpart. The specific exponential-type asymptotics of the Jacobi matrix enables an efficient approximation of the Stieltjes transform of the asymptotic spectral distribution via a finite continued fraction. As a consequence, we also obtain estimates for the density of the asymptotic distribution and the location of outliers. We provide consistency guarantees for our proposed estimators, proving their convergence in the high-dimensional regime. We demonstrate that, when applied to standard spiked covariance models, our approach outperforms existing methods in computational efficiency and runtime, while still maintaining robustness to exotic population covariances.
翻译:本文提出了一种新方法,用于估计一类广义尖峰协方差模型中的尖峰数量,而无需直接计算样本协方差矩阵的特征值。该方法基于Lanczos算法以及相关Jacobi矩阵及其Cholesky分解的渐近性质。分析的关键在于将特征向量谱分布解释为其渐近对应物的扰动。Jacobi矩阵的特定指数型渐近性质使得通过有限连分式高效逼近渐近谱分布的Stieltjes变换成为可能。因此,我们还获得了渐近分布密度及异常值位置的估计。我们为所提出的估计量提供了一致性保证,证明了它们在高维情况下的收敛性。实验表明,当应用于标准尖峰协方差模型时,我们的方法在计算效率和运行时间上优于现有方法,同时仍能保持对非典型总体协方差的鲁棒性。