Sparse Principal Component Analysis is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. At its heart, this involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing work address sparse PCA via heuristics such as iteratively computing one sparse PC and deflating the covariance matrix, which does not guarantee the orthogonality, let alone the optimality, of the resulting solution. We challenge this status by reformulating the orthogonality conditions as rank constraints and optimizing over the sparsity and rank constraints simultaneously. We design tight semidefinite relaxations and propose tractable second-order cone versions of these relaxations which supply high-quality upper bounds. We also design valid second-order cone inequalities which hold when each PC's individual sparsity is specified, and demonstrate that these inequalities tighten our relaxations significantly. Moreover, we propose exact methods and rounding mechanisms that exploit these relaxations' tightness to obtain solutions with a bound gap on the order of 1%-5% for real-world datasets with p = 100s or 1000s of features and r \in {2, 3} components. We investigate the performance of our methods in spiked covariance settings and demonstrate that simultaneously considering the orthogonality and sparsity constraints leads to improvements in the Area Under the ROC curve of 2%-8% compared to state-of-the-art deflation methods. All in all, our approach solves sparse PCA problems with multiple components to certifiable (near) optimality in a practically tractable fashion.
翻译:偏差主元件分析是获得特性或主要元件( PCs) 组合的主要技术, 它以可解释的方式解释高维数据集的差异。 在它的心脏上, 它涉及解决一个超度和正向限制的峰值最大化问题, 这在计算上极具挑战性。 大部分现有工作通过超常计算一个稀薄的元件和淡化共变矩阵来解决稀有的五氯苯, 这并不能保证由此产生的解决方案的正向性, 更不用说最佳性能。 我们通过重新配置正向性条件来挑战这一状态, 因为它是排位限制, 并且同时优化对宽度和排位限制。 我们设计了半不均度的松缩和正向性约束, 并提出了这些松散的二阶的松动组合。 我们还设计了有效的第二阶共和级的不平等, 当每个PC的单个宽度被指定时, 这些不平等会大大地缩小我们的放松。 此外, 我们提出了精确的方法和圆形机制, 利用这些松缩的正向的曲线, 以直径直的平流的平流方式, 以直径直达的平差的平流的平流的平流的平流的平流的平流的平流的平流的平流 。