Principal component analysis (PCA) is a standard dimensionality reduction technique used in various research and applied fields. From an algorithmic point of view, classical PCA can be formulated in terms of operations on a multivariate Gaussian likelihood. As a consequence of the implied Gaussian formulation, the principal components are not robust to outliers. In this paper, we propose a modified formulation, based on the use of a multivariate Cauchy likelihood instead of the Gaussian likelihood, which has the effect of robustifying the principal components. We present an algorithm to compute these robustified principal components. We additionally derive the relevant influence function of the first component and examine its theoretical properties. Simulation experiments on high-dimensional datasets demonstrate that the estimated principal components based on the Cauchy likelihood outperform or are on par with existing robust PCA techniques.
翻译:主要元件分析(PCA)是各种研究和应用领域使用的一种标准维度减少技术。从算法的观点来看,古典五氯苯甲醚可以在多变的可能性下以操作方式形成。由于隐含高斯配方,主要元件对外星体并不强大。在本文中,我们提议了一种修改的配方,其依据是使用多变孔雀的可能性,而不是高斯的可能性,其效果是使主要元件更加稳健。我们提出了一个算法,以计算这些坚固的主要元件。我们还从第一个元件的相关影响功能中得出,并研究其理论特性。高维数据集的模拟实验表明,基于易变概率超过或与现有稳健的五氯苯甲醚技术相当的估计主要元件。