松散的五氯苯甲醚:几何方法 (Sparse PCA: a Geometric Approach)

We consider the problem of maximizing the variance explained from a data matrix using orthogonal sparse principal components that have a support of fixed cardinality. While most existing methods focus on building principal components (PCs) iteratively through deflation, we propose GeoSPCA, a novel algorithm to build all PCs at once while satisfying the orthogonality constraints which brings substantial benefits over deflation. This novel approach is based on the left eigenvalues of the covariance matrix which helps circumvent the non-convexity of the problem by approximating the optimal solution using a binary linear optimization problem that can find the optimal solution. The resulting approximation can be used to tackle different versions of the sparse PCA problem including the case in which the principal components share the same support or have disjoint supports and the Structured Sparse PCA problem. We also propose optimality bounds and illustrate the benefits of GeoSPCA in selected real world problems both in terms of explained variance, sparsity and tractability. Improvements vs. the greedy algorithm, which is often at par with state-of-the-art techniques, reaches up to 24% in terms of variance while solving real world problems with 10,000s of variables and support cardinality of 100s in minutes. We also apply GeoSPCA in a face recognition problem yielding more than 10% improvement vs. other PCA based technique such as structured sparse PCA.

翻译：我们考虑的是使用支持固定基点的正方位稀少主要组成部分的数据矩阵所解释的差异最大化问题。虽然大多数现有方法侧重于通过通缩迭接地建立主要组成部分(PCs),但我们建议GeoSPCA,这是一次性建立所有个人计算机的新算法,同时满足在通缩方面带来巨大好处的正方位限制。这种新颖的方法基于共变矩阵的左翼差异值,它有助于绕过问题的非共性,通过使用能够找到最佳解决办法的双元线性优化问题来接近最佳解决方案。由此产生的近差可用于解决稀疏的五氯苯甲醚问题的不同版本,包括主要组成部分共享相同支持或不连接支持的情况以及结构松散的五氯苯问题。我们还提出了最佳性界限,并展示了GeoSPCA在某些现实世界问题中的好处,这些问题在解释差异、宽度和易容性方面,改进了贪婪的方位算法,通常与最新技术相当,在结构上的精细度方面达到24%,同时在深度变异性方面运用了10分法技术,在深度分析中,在深度变异度上也采用了10分法。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日