粗粗主要成分分析新依据 (A New Basis for Sparse Principal Component Analysis)

Previous versions of sparse principal component analysis (PCA) have presumed that the eigen-basis (a $p \times k$ matrix) is approximately sparse. We propose a method that presumes the $p \times k$ matrix becomes approximately sparse after a $k \times k$ rotation. The simplest version of the algorithm initializes with the leading $k$ principal components. Then, the principal components are rotated with an $k \times k$ orthogonal rotation to make them approximately sparse. Finally, soft-thresholding is applied to the rotated principal components. This approach differs from prior approaches because it uses an orthogonal rotation to approximate a sparse basis. One consequence is that a sparse component need not to be a leading eigenvector, but rather a mixture of them. In this way, we propose a new (rotated) basis for sparse PCA. In addition, our approach avoids "deflation" and multiple tuning parameters required for that. Our sparse PCA framework is versatile; for example, it extends naturally to a two-way analysis of a data matrix for simultaneous dimensionality reduction of rows and columns. We provide evidence showing that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods. Through three applications -- sparse coding of images, analysis of transcriptome sequencing data, and large-scale clustering of social networks, we demonstrate the modern usefulness of sparse PCA in exploring multivariate data.

翻译：原始元件分析( PCA) 以前版本的原始元件分析( PCA ) 的旋转假设 eigen- basis ( $p $\ times k$ 矩阵) 大约是很少的。我们提出一种方法, 假设美元= time k$ 旋转后, 美元= time k$ k$ 旋转后, 基质分析( PCA) 基质分析( PCA ) 将原始元件的原始版本旋转。然后, 主要元件会以美元\ time k$ k$ orthogon 旋转后, 假设 egen- pal- production 和 mission roadal roduction 工具的软盘化方法。一种稀释组件的稀释方法可以避免“ 衰减” 和多调值参数。我们稀释的 CPA 框架是多功能化的, 例如, 它会自然扩展到先前的方法, 因为它使用正统旋转的网络旋转旋转, 接近于稀释的旋转旋转旋转基旋转基质旋转基基基基图基数基数。一个比数据解数据解解, 我们为的解解的的的的解算的的的的的的的的基数基数解的解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解度解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解解

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

923页ppt！经典课《机器学习核方法》，附视频

专知会员服务

105+阅读 · 2021年3月1日

【经典书】计算最优传输，209页pdf，Computational Optimal Transport

专知会员服务

75+阅读 · 2021年1月10日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日