We present a novel technique for sparse principal component analysis. This method, named Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA), is based on the formula for computing squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. We explore two versions of the EESPCA method: a version that uses a fixed threshold for inducing sparsity and a version that selects the threshold via cross-validation. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, the EESPCA method achieves these benefits while maintaining out-of-sample reconstruction error and PC estimation error close to the lowest error generated by all evaluated approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional data sets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs.
翻译:我们展示了一种稀疏主元件分析的新技术。这个方法的名称是Eigenvalues Sprassy主元件分析(ESPCA)中的Eigenvictors,其依据是计算全矩阵和相关次矩阵的等值的Hermitian矩阵的正方位成份负荷公式。我们探索了ESPCA方法的两个版本:一种是使用固定阈值诱导散度的版本,一种是通过交叉校验选择门槛值的版本。相对于Witten et al.、Yuan & Zhang和Tan et al.等最先进稀疏的CPA方法, 固定阈值EESCA技术在计算速度上提供了一种质量级定序改进,不需要通过交叉校验来估算调参数,更准确地确定在一系列数据矩阵大小和变量结构中的真正零主要组件装载值。相对于Witten and al.、Yuan & Zang and T. et al., 固定门槛值 EESSCA 技术在计算方法中具有最低实际相关性,因此,通过对静态的统计-CA技术的反复计算方法进行了评估。