We analyze a practical algorithm for sparse PCA on incomplete and noisy data under a general non-random sampling scheme. The algorithm is based on a semidefinite relaxation of the $\ell_1$-regularized PCA problem. We provide theoretical justification that under certain conditions, we can recover the support of the sparse leading eigenvector with high probability by obtaining a unique solution. The conditions involve the spectral gap between the largest and second-largest eigenvalues of the true data matrix, the magnitude of the noise, and the structural properties of the observed entries. The concepts of algebraic connectivity and irregularity are used to describe the structural properties of the observed entries. We empirically justify our theorem with synthetic and real data analysis. We also show that our algorithm outperforms several other sparse PCA approaches especially when the observed entries have good structural properties. As a by-product of our analysis, we provide two theorems to handle a deterministic sampling scheme, which can be applied to other matrix-related problems.
翻译:我们根据一般的非随机抽样办法,分析关于不完整和噪音数据的零散五氯苯甲醚的实用算法;该算法基于对美元1美元正规化五氯苯甲醚问题的半无限期放松;我们提供理论理由,说明在某些条件下,我们可以通过获得独特的解决办法,以很高的概率恢复稀疏主要五氯苯甲醚的支持;这些条件涉及真实数据矩阵的最大值和第二大值之间的光谱差距,噪音的大小,以及观察到条目的结构特性。代数连接和异常概念被用来描述所观察到条目的结构特性。我们用合成和真实的数据分析,从经验上为我们的标本提供理由。我们还表明,我们的算法超越了其他几个稀散五氯苯甲醚方法,特别是当所观察到的条目具有良好的结构特性时。作为我们分析的副产品,我们提供了两个用于处理确定性取样方法的理论,这些方法可以适用于其他与矩阵有关的问题。