We consider the problem of principal component analysis from a data matrix where the entries of each column have undergone some unknown permutation, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that for generic enough data, and up to a permutation of the coordinates of the ambient space, there is a unique subspace of minimal dimension that explains the data. We show that a permutation-invariant system of polynomial equations has finitely many solutions, with each solution corresponding to a row permutation of the ground-truth data matrix. Allowing for missing entries on top of permutations leads to the problem of unlabeled matrix completion, for which we give theoretical results of similar flavor. We also propose a two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data has been permuted. Stage-I of this pipeline employs robust-PCA methods to estimate the ground-truth column-space. Equipped with the column-space, stage-II applies methods for linear regression without correspondences to restore the permuted data. A computational study reveals encouraging findings, including the ability of UPCA to handle face images from the Extended Yale-B database with arbitrarily permuted patches of arbitrary size in $0.3$ seconds on a standard desktop computer.
翻译:我们从数据矩阵中考虑主要组成部分分析的问题,每个列的条目都经过了一些未知的变异,称为“无标签主元元分析”。我们用代数几何方法确定,对于足够通用的数据,直至环境空间坐标的变异,有一个独特的最小维度的子空间来解释数据。我们显示,多式方程式的变异系统有许多有限的解决方案,每个解决方案都对应地平流数据矩阵的行变异。允许在变异顶端缺失的条目导致未标的矩阵完成问题,为此我们给出类似口味的理论结果。我们还提议,对于仅部分数据被变异的实用相关案例,可以使用两阶段的波段计算管道管道。该管道的第一阶段采用强效-PCA方法来估计地平流的列空格空间。在列空间中,第二阶段应用直线回归方法,而无需在平面数据表中进行通信,以恢复高端平方位的平面图像处理能力,包括高端平面的平面图像处理能力。