以大宗平值匹配分析法对共变量矩阵中加压的平值数量进行估计 (Estimation of the number of spiked eigenvalues in a covariance matrix by bulk eigenvalue matching analysis)

The spiked covariance model has gained increasing popularity in high-dimensional data analysis. A fundamental problem is determination of the number of spiked eigenvalues, $K$. For estimation of $K$, most attention has focused on the use of $top$ eigenvalues of sample covariance matrix, and there is little investigation into proper ways of utilizing $bulk$ eigenvalues to estimate $K$. We propose a principled approach to incorporating bulk eigenvalues in the estimation of $K$. Our method imposes a working model on the residual covariance matrix, which is assumed to be a diagonal matrix whose entries are drawn from a gamma distribution. Under this model, the bulk eigenvalues are asymptotically close to the quantiles of a fixed parametric distribution. This motivates us to propose a two-step method: the first step uses bulk eigenvalues to estimate parameters of this distribution, and the second step leverages these parameters to assist the estimation of $K$. The resulting estimator $\hat{K}$ aggregates information in a large number of bulk eigenvalues. We show the consistency of $\hat{K}$ under a standard spiked covariance model. We also propose a confidence interval estimate for $K$. Our extensive simulation studies show that the proposed method is robust and outperforms the existing methods in a range of scenarios. We apply the proposed method to analysis of a lung cancer microarray data set and the 1000 Genomes data set.

翻译：在高维数据分析中,激增的变差模型越来越受欢迎。一个根本性的问题是确定加压的变差矩阵数量。关于美元的估计, 大部分注意力集中在抽样同差矩阵的美元顶值上, 几乎没有调查使用美元顶值来估计美元顶值。我们建议了一种原则性的方法, 将大宗的变差值纳入估计美元美元。我们的方法对剩余变差矩阵规定了一种工作模型, 假设该变差矩阵是从伽马分布中提取的。在这种模型下, 大部分的变差值与固定的参数分布的四分法几乎接近。这促使我们提出一种两步方法: 第一步使用大宗的变差值来估计这种分布的参数, 第二步则利用这些参数来帮助估算美元。由此得出的微变差矩阵模型是美元基数矩阵矩阵表。我们的变差模型 3K 总体模型显示一个高比值的模型。