将大比例分散的五氯苯甲醚溶解到可检验(近)最佳度 (Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality)

Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k=5 covariates from p=300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of $1-2\%$ in practice within minutes for $p=100$s or hours for $p=1,000$s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical datasets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.

翻译：主要成分分析(PCA)是获取主要组成部分的流行的维度减少技术,这些主要组成部分是原始特征中一小部分的线性组合。现有方法无法用超过1美元=100美元的变量提供可证实的最佳主要组成部分。通过将稀有的五氯苯甲醚改制为混凝土混凝土半脱硫优化问题,我们设计了一种切割机方法,以解决在从p=300变量中选择 k=5 共变数的尺度上可证实的最佳性的问题,并在更大的尺度上提供小的捆绑差距。我们还提议了一个convex 放松和贪婪四舍五入方案,在实际操作中以分钟内提供1-2美元=100美元或1 000美元的小时的捆绑差距,因此是精确比例法的一种可行的替代方法。我们使用真实世界的金融和医疗数据集,说明我们的方法在规模上可以获取可解释的主要组成部分的能力。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

专知会员服务

28+阅读 · 2020年3月11日