We consider the problem of maximizing the variance explained from a data matrix using orthogonal sparse principal components that have a support of fixed cardinality. While most existing methods focus on building principal components (PCs) iteratively through deflation, we propose GeoSPCA, a novel algorithm to build all PCs at once while satisfying the orthogonality constraints which brings substantial benefits over deflation. This novel approach is based on the left eigenvalues of the covariance matrix which helps circumvent the non-convexity of the problem by approximating the optimal solution using a binary linear optimization problem that can find the optimal solution. The resulting approximation can be used to tackle different versions of the sparse PCA problem including the case in which the principal components share the same support or have disjoint supports and the Structured Sparse PCA problem. We also propose optimality bounds and illustrate the benefits of GeoSPCA in selected real world problems both in terms of explained variance, sparsity and tractability. Improvements vs. the greedy algorithm, which is often at par with state-of-the-art techniques, reaches up to 24% in terms of variance while solving real world problems with 10,000s of variables and support cardinality of 100s in minutes. We also apply GeoSPCA in a face recognition problem yielding more than 10% improvement vs. other PCA based technique such as structured sparse PCA.
翻译:我们考虑的是使用支持固定基点的正方位稀少主要组成部分的数据矩阵所解释的差异最大化问题。虽然大多数现有方法侧重于通过通缩迭接地建立主要组成部分(PCs),但我们建议GeoSPCA,这是一次性建立所有个人计算机的新算法,同时满足在通缩方面带来巨大好处的正方位限制。这种新颖的方法基于共变矩阵的左翼差异值,它有助于绕过问题的非共性,通过使用能够找到最佳解决办法的双元线性优化问题来接近最佳解决方案。由此产生的近差可用于解决稀疏的五氯苯甲醚问题的不同版本,包括主要组成部分共享相同支持或不连接支持的情况以及结构松散的五氯苯问题。我们还提出了最佳性界限,并展示了GeoSPCA在某些现实世界问题中的好处,这些问题在解释差异、宽度和易容性方面,改进了贪婪的方位算法,通常与最新技术相当,在结构上的精细度方面达到24%,同时在深度变异性方面运用了10分法技术,在深度分析中,在深度变异度上也采用了10分法。