关于五氯苯甲醚和光谱聚集的 $\ ell_ p$ 理论 (An $\ell_p$ theory of PCA and spectral clustering)

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an $\ell_p$ perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel $\ell_p$ analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in $\ell_p$ norm, which includes $\ell_2$ and $\ell_\infty$ as special examples. For sub-Gaussian mixture models, the choice of $p$ giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the $\ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. These also provide optimal recovery results for Gaussian mixture and stochastic block models as special cases.

翻译：元件分析(PCA)是统计和机器学习的有力工具。虽然对五氯苯甲醚的现有研究侧重于主要成分的回收及其相关的eigenvalues, 但对于单个主要组成部分的分数的精确定性却很少, 从而产生低维嵌入样本。这妨碍了对各种光谱方法的分析。在本文中, 我们首先为希尔伯特空间的空洞版五氯苯甲醚开发一个$\ ell_ p$ perturbregation理论, 该理论在有超超超异采性噪声的情况下可明显地改进了香草五氯苯。通过对电子元进行新颖的 $\ ell_ p$ 分析, 我们调查主要组成部分矢量的切入式行为, 并表明它们可以用$\ ell_ p$ 规范的格拉姆矩阵线性功能进行近似。这包括$\ ell_ 2美元和 $\\ ell\ intyftypectypecial exmexmationsmexmations, offictive-nessional-nessional missional mission_ slational deal deal degresmissional ex

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日