Principal Component Analysis (PCA) has been widely used for dimensionality reduction and feature extraction. Robust PCA (RPCA), under different robust distance metrics, such as l1-norm and l2, p-norm, can deal with noise or outliers to some extent. However, real-world data may display structures that can not be fully captured by these simple functions. In addition, existing methods treat complex and simple samples equally. By contrast, a learning pattern typically adopted by human beings is to learn from simple to complex and less to more. Based on this principle, we propose a novel method called Self-paced PCA (SPCA) to further reduce the effect of noise and outliers. Notably, the complexity of each sample is calculated at the beginning of each iteration in order to integrate samples from simple to more complex into training. Based on an alternating optimization, SPCA finds an optimal projection matrix and filters out outliers iteratively. Theoretical analysis is presented to show the rationality of SPCA. Extensive experiments on popular data sets demonstrate that the proposed method can improve the state of-the-art results considerably.
翻译:主要成分分析(PCA)被广泛用于维度减低和地貌提取。强力五氯苯甲醚(RPCA)在不同强健的距离度量度下(如1-norm和1-2,p-norm),可以在一定程度上处理噪音或外缘;然而,真实世界数据可能显示无法被这些简单功能完全捕获的结构。此外,现有方法同样处理复杂和简单的样本。相比之下,人类通常采用的一种学习模式是从简单到复杂和较少的学习。根据这一原则,我们提议一种称为自动的五氯苯甲醚(SPCA)的新方法,以进一步减少噪音和外缘的影响。值得注意的是,每种样本的复杂性都是在每次试采开始时计算出来的,以便将简单到更复杂的样本纳入培训。在交替优化的基础上,SPCA会发现一个最佳的预测矩阵和过滤器迭接式。提出理论分析是为了显示SPCA的合理性。对大众数据进行的广泛实验表明,拟议的方法可以大大改善艺术结果的状况。