Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an $\ell_p$ perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel $\ell_p$ analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in $\ell_p$ norm, which includes $\ell_2$ and $\ell_\infty$ as special examples. For sub-Gaussian mixture models, the choice of $p$ giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the $\ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. These also provide optimal recovery results for Gaussian mixture and stochastic block models as special cases.
翻译:元件分析(PCA)是统计和机器学习的有力工具。 虽然对五氯苯甲醚的现有研究侧重于主要成分的回收及其相关的eigenvalues, 但对于单个主要组成部分的分数的精确定性却很少, 从而产生低维嵌入样本。 这妨碍了对各种光谱方法的分析。 在本文中, 我们首先为希尔伯特空间的空洞版五氯苯甲醚开发一个$\ ell_ p$ perturbregation理论, 该理论在有超超超异采性噪声的情况下可明显地改进了香草五氯苯。 通过对电子元进行新颖的 $\ ell_ p$ 分析, 我们调查主要组成部分矢量的切入式行为, 并表明它们可以用$\ ell_ p$ 规范的格拉姆矩阵线性功能进行近似。 这包括$\ ell_ 2美元 和 $\\ ell\ intyftypectypecial exmexmationsmexmations, offictive-nessional-nessional missional mission_ slational deal deal degresmissional ex