Principal component analysis (PCA) has been widely used as an effective technique for feature extraction and dimension reduction. In the High Dimension Low Sample Size (HDLSS) setting, one may prefer modified principal components, with penalized loadings, and automated penalty selection by implementing model selection among these different models with varying penalties. The earlier work [1, 2] has proposed penalized PCA, indicating the feasibility of model selection in $L_2$- penalized PCA through the solution path of Ridge regression, however, it is extremely time-consuming because of the intensive calculation of matrix inverse. In this paper, we propose a fast model selection method for penalized PCA, named Approximated Gradient Flow (AgFlow), which lowers the computation complexity through incorporating the implicit regularization effect introduced by (stochastic) gradient flow [3, 4] and obtains the complete solution path of $L_2$-penalized PCA under varying $L_2$-regularization. We perform extensive experiments on real-world datasets. AgFlow outperforms existing methods (Oja [5], Power [6], and Shamir [7] and the vanilla Ridge estimators) in terms of computation costs.
翻译:主要成分分析(PCA)被广泛用作地貌提取和降低尺寸的有效技术。在高尺寸低样本规模(HDLSS)的设置中,人们可能更倾向于修改主要成分,规定惩罚性装载,并通过在这些不同模型中采用示范选择,规定不同的惩罚,自动选择刑罚。早先的工作[1,2] 已经提出惩罚性五氯苯甲醚,指出通过山脊回归的解决方案路径,以2美元罚款的五氯苯甲醚模式选择的可行性,但是,由于对矩阵进行密集的反向计算,它耗时非常多。在本文中,我们提议为受处罚的五氯苯甲醚(Apopblod gradient Flow (AgFlow))采用快速模式选择方法,该方法通过纳入(cheatic)梯度流[3,4] 引入的隐含的正规化效果,降低计算的复杂性,并获得2美元计值的五氯苯甲醚的完整溶液路径,但以不同的L_2美元为常规化。我们在真实世界数据集上进行广泛的实验。AgFlow 超越了现有方法(Oja [5]、Pow [6] 和Shamir [7] 和香层估测 成本的计算方法。