We consider the extreme eigenvalues of the sample covariance matrix $Q=YY^*$ under the generalized elliptical model that $Y=\Sigma^{1/2}XD.$ Here $\Sigma$ is a bounded $p \times p$ positive definite deterministic matrix representing the population covariance structure, $X$ is a $p \times n$ random matrix containing either independent columns sampled from the unit sphere in $\mathbb{R}^p$ or i.i.d. centered entries with variance $n^{-1},$ and $D$ is a diagonal random matrix containing i.i.d. entries and independent of $X.$ Such a model finds important applications in statistics and machine learning. In this paper, assuming that $p$ and $n$ are comparably large, we prove that the extreme edge eigenvalues of $Q$ can have several types of distributions depending on $\Sigma$ and $D$ asymptotically. These distributions include: Gumbel, Fr\'echet, Weibull, Tracy-Widom, Gaussian and their mixtures. On the one hand, when the random variables in $D$ have unbounded support, the edge eigenvalues of $Q$ can have either Gumbel or Fr\'echet distribution depending on the tail decay property of $D.$ On the other hand, when the random variables in $D$ have bounded support, under some mild regularity assumptions on $\Sigma,$ the edge eigenvalues of $Q$ can exhibit Weibull, Tracy-Widom, Gaussian or their mixtures. Based on our theoretical results, we consider two important applications. First, we propose some statistics and procedure to detect and estimate the possible spikes for elliptically distributed data. Second, in the context of a factor model, by using the multiplier bootstrap procedure via selecting the weights in $D,$ we propose a new algorithm to infer and estimate the number of factors in the factor model. Numerical simulations also confirm the accuracy and powerfulness of our proposed methods and illustrate better performance compared to some existing methods in the literature.
翻译:我们认为,在通用的椭圆模型下,来自单位域的采样精度基质的极值为$YY=美元,根据这种模型,美元为美元Sigma=1/2}XD。 这里,美元Sgma$是一个约束的美元美元美元 确定性基数,代表人口变异结构,美元X$是一个美元确定性基数的绝对值 n美元随机基数,包含以$mathbb{R<unk> p$或i.d.为核心的精度基数。 以美元和美元为核心的精度基数, 美元和美元是包含i.i.d. 条目和不以美元为独立的正数的正数随机基数。 在本文中,美元和美元基数的极值基数中, 以美元为基数为基数, 以美元为基数的基数为基数 。</s>