用于高维数据的自动稀散五氯苯甲醚 (Automatic sparse PCA for high-dimensional data)

Sparse principal component analysis (SPCA) methods have proven to efficiently analyze high-dimensional data. Among them, threshold-based SPCA (TSPCA) is computationally more cost-effective as compared to regularized SPCA, based on L1 penalties. Here, we investigate the efficacy of TSPCA for high-dimensional data settings and illustrate that, for a suitable threshold value, TSPCA achieves satisfactory performance for high-dimensional data. Thus, the performance of the TSPCA depends heavily on the selected threshold value. To this end, we propose a novel thresholding estimator to obtain the principal component (PC) directions using a customized noise-reduction methodology. The proposed technique is consistent under mild conditions, unaffected by threshold values, and therefore yields more accurate results quickly at a lower computational cost. Furthermore, we explore the shrinkage PC directions and their application in clustering high-dimensional data. Finally, we evaluate the performance of the estimated shrinkage PC directions in actual data analyses.

翻译：事实证明,基于临界值的SPCA(TSPA)方法对高维数据进行了高效分析,其中,基于临界值的SPCA(TSPA)在计算上比基于L1惩罚的正常的SPCA更具成本效益。在这里,我们调查TSPCA对高维数据设置的功效,并表明,为了合适的临界值,TSPCA在高维数据上取得了令人满意的性能。因此,TSPCA的性能在很大程度上取决于选定的临界值。为此,我们提议建立一个新的临界值估计值,以便使用定制的减少噪音方法获得主要组成部分(PC)的方向。拟议的技术在温和条件下是一致的,不受临界值影响,因此以较低的计算成本迅速得出更准确的结果。此外,我们探索PC方向的收缩及其在高维数据组合中的应用。最后,我们评估了实际数据分析中估计的压缩PC方向的性能。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日