含有多种成分的松散的五氯苯甲醚 (Sparse PCA With Multiple Components)

Sparse Principal Component Analysis is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. At its heart, this involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing work address sparse PCA via heuristics such as iteratively computing one sparse PC and deflating the covariance matrix, which does not guarantee the orthogonality, let alone the optimality, of the resulting solution. We challenge this status by reformulating the orthogonality conditions as rank constraints and optimizing over the sparsity and rank constraints simultaneously. We design tight semidefinite relaxations and propose tractable second-order cone versions of these relaxations which supply high-quality upper bounds. We also design valid second-order cone inequalities which hold when each PC's individual sparsity is specified, and demonstrate that these inequalities tighten our relaxations significantly. Moreover, we propose exact methods and rounding mechanisms that exploit these relaxations' tightness to obtain solutions with a bound gap on the order of 1%-5% for real-world datasets with p = 100s or 1000s of features and r \in {2, 3} components. We investigate the performance of our methods in spiked covariance settings and demonstrate that simultaneously considering the orthogonality and sparsity constraints leads to improvements in the Area Under the ROC curve of 2%-8% compared to state-of-the-art deflation methods. All in all, our approach solves sparse PCA problems with multiple components to certifiable (near) optimality in a practically tractable fashion.

翻译：偏差主元件分析是获得特性或主要元件( PCs) 组合的主要技术, 它以可解释的方式解释高维数据集的差异。在它的心脏上, 它涉及解决一个超度和正向限制的峰值最大化问题, 这在计算上极具挑战性。大部分现有工作通过超常计算一个稀薄的元件和淡化共变矩阵来解决稀有的五氯苯, 这并不能保证由此产生的解决方案的正向性, 更不用说最佳性能。我们通过重新配置正向性条件来挑战这一状态, 因为它是排位限制, 并且同时优化对宽度和排位限制。我们设计了半不均度的松缩和正向性约束, 并提出了这些松散的二阶的松动组合。我们还设计了有效的第二阶共和级的不平等, 当每个PC的单个宽度被指定时, 这些不平等会大大地缩小我们的放松。此外, 我们提出了精确的方法和圆形机制, 利用这些松缩的正向的曲线, 以直径直的平流的平流方式, 以直径直达的平差的平流的平流的平流的平流的平流的平流的平流的平流的平流。

相关内容

PCA

关注 0

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日