In this paper, we consider a new variant for principal component analysis (PCA), aiming to capture the grouping and/or sparse structures of factor loadings simultaneously. To achieve these goals, we employ a non-convex truncated regularization with naturally adjustable sparsity and grouping effects, and propose the Feature Grouping and Sparse Principal Component Analysis (FGSPCA). The proposed FGSPCA method encourages the factor loadings with similar values to collapse into disjoint homogeneous groups for feature grouping or into a special zero-valued group for feature selection, which in turn helps reducing model complexity and increasing model interpretation. Usually, existing structured PCA methods require prior knowledge to construct the regularization term. However, the proposed FGSPCA can simultaneously capture the grouping and/or sparse structures of factor loadings without any prior information. To solve the resulting non-convex optimization problem, we propose an alternating algorithm that incorporates the difference-of-convex programming, augmented Lagrange method and coordinate descent method. Experimental results demonstrate the promising performance and efficiency of the new method on both synthetic and real-world datasets. An R implementation of FGSPCA can be found on github {https://github.com/higeeks/FGSPCA}.
翻译:在本文件中,我们考虑一个主要组成部分分析的新变体,目的是同时捕捉要素负荷的组合和/或稀疏结构,目的是同时捕捉要素负荷的组合和/或稀释结构。为实现这些目标,我们采用非混凝土脱节的正规化,具有自然可调整的宽度和组合效应,并提出特质组合和粗化主要组成部分分析法(FGSPCA)。拟议的FGSPCA方法鼓励将具有类似价值的因素装入不连贯的组合,以形成特征组合或特征选择特别的零值组,这反过来有助于减少模型复杂性和增加模型解释。通常,现有的结构化的五氯苯方法需要事先知识来构建正规化的术语。然而,拟议的FGSPCA可以在没有任何事先信息的情况下同时捕捉要素负荷的组合和/或稀释结构。为解决由此产生的非convex最佳化问题,我们建议采用一种交替算法,将差异-convex编程、增强Lagrange方法和协调的后种方法纳入。实验结果显示合成和真实世界数据集的新方法有希望的业绩和效率。FPCA/Rubggi/A的实施工作可在FSG/A上找到。