Sparse principal component analysis (SPCA) methods have proven to efficiently analyze high-dimensional data. Among them, threshold-based SPCA (TSPCA) is computationally more cost-effective as compared to regularized SPCA, based on L1 penalties. Here, we investigate the efficacy of TSPCA for high-dimensional data settings and illustrate that, for a suitable threshold value, TSPCA achieves satisfactory performance for high-dimensional data. Thus, the performance of the TSPCA depends heavily on the selected threshold value. To this end, we propose a novel thresholding estimator to obtain the principal component (PC) directions using a customized noise-reduction methodology. The proposed technique is consistent under mild conditions, unaffected by threshold values, and therefore yields more accurate results quickly at a lower computational cost. Furthermore, we explore the shrinkage PC directions and their application in clustering high-dimensional data. Finally, we evaluate the performance of the estimated shrinkage PC directions in actual data analyses.
翻译:事实证明,基于临界值的SPCA(TSPA)方法对高维数据进行了高效分析,其中,基于临界值的SPCA(TSPA)在计算上比基于L1惩罚的正常的SPCA更具成本效益。在这里,我们调查TSPCA对高维数据设置的功效,并表明,为了合适的临界值,TSPCA在高维数据上取得了令人满意的性能。因此,TSPCA的性能在很大程度上取决于选定的临界值。为此,我们提议建立一个新的临界值估计值,以便使用定制的减少噪音方法获得主要组成部分(PC)的方向。拟议的技术在温和条件下是一致的,不受临界值影响,因此以较低的计算成本迅速得出更准确的结果。此外,我们探索PC方向的收缩及其在高维数据组合中的应用。最后,我们评估了实际数据分析中估计的压缩PC方向的性能。