Principal component analysis (PCA) is one of the most widely used dimension reduction and multivariate statistical techniques. From a probabilistic perspective, PCA seeks a low-dimensional representation of data in the presence of independent identical Gaussian noise. Probabilistic PCA (PPCA) and its variants have been extensively studied for decades. Most of them assume the underlying noise follows a certain independent identical distribution. However, the noise in the real world is usually complicated and structured. To address this challenge, some variants of PCA for data with non-IID noise have been proposed. However, most of the existing methods only assume that the noise is correlated in the feature space while there may exist two-way structured noise. To this end, we propose a powerful and intuitive PCA method (MN-PCA) through modeling the graphical noise by the matrix normal distribution, which enables us to explore the structure of noise in both the feature space and the sample space. MN-PCA obtains a low-rank representation of data and the structure of noise simultaneously. And it can be explained as approximating data over the generalized Mahalanobis distance. We develop two algorithms to solve this model: one maximizes the regularized likelihood, the other exploits the Wasserstein distance, which is more robust. Extensive experiments on various data demonstrate their effectiveness.
翻译:主要组成部分分析(PCA)是最广泛使用的减少维度和多变统计技术之一。从概率的角度,常设仲裁法院在独立相同的高斯噪音面前寻求低维数据表示。几十年来,对概率五氯苯甲醚及其变体进行了广泛研究,其中多数假设潜在的噪音是某种独立的相同分布。然而,现实世界中的噪音通常是复杂和结构化的。为了应对这一挑战,同时提出了非IID噪音数据的一些替代物。但是,大多数现有方法仅假设在地貌空间中噪音是相互关联的,而可能有双向结构噪音。为此,我们提议一种强大和直观的五氯苯甲醚方法(MNM-PCA),通过以矩阵正常分布模式模拟图形噪音,使我们能够探索地貌空间和样本空间的噪音结构。MNPCA获得的数据和噪音结构的低级表述。可以将其解释为对通用的马哈拉诺比空间空间中的噪音进行辅助模型,同时可能存在双向结构噪音结构的噪音。为此,我们提出了一种强大和直觉的五氯苯甲醚方法(MNM-PCA),为此,我们用矩阵模型来模拟其最牢固的距离,我们开发了另一种数据。