主要构成部分分析 (Self-paced Principal Component Analysis)

Principal Component Analysis (PCA) has been widely used for dimensionality reduction and feature extraction. Robust PCA (RPCA), under different robust distance metrics, such as l1-norm and l2, p-norm, can deal with noise or outliers to some extent. However, real-world data may display structures that can not be fully captured by these simple functions. In addition, existing methods treat complex and simple samples equally. By contrast, a learning pattern typically adopted by human beings is to learn from simple to complex and less to more. Based on this principle, we propose a novel method called Self-paced PCA (SPCA) to further reduce the effect of noise and outliers. Notably, the complexity of each sample is calculated at the beginning of each iteration in order to integrate samples from simple to more complex into training. Based on an alternating optimization, SPCA finds an optimal projection matrix and filters out outliers iteratively. Theoretical analysis is presented to show the rationality of SPCA. Extensive experiments on popular data sets demonstrate that the proposed method can improve the state of-the-art results considerably.

翻译：主要成分分析(PCA)被广泛用于维度减低和地貌提取。强力五氯苯甲醚(RPCA)在不同强健的距离度量度下(如1-norm和1-2,p-norm),可以在一定程度上处理噪音或外缘;然而,真实世界数据可能显示无法被这些简单功能完全捕获的结构。此外,现有方法同样处理复杂和简单的样本。相比之下,人类通常采用的一种学习模式是从简单到复杂和较少的学习。根据这一原则,我们提议一种称为自动的五氯苯甲醚(SPCA)的新方法,以进一步减少噪音和外缘的影响。值得注意的是,每种样本的复杂性都是在每次试采开始时计算出来的,以便将简单到更复杂的样本纳入培训。在交替优化的基础上,SPCA会发现一个最佳的预测矩阵和过滤器迭接式。提出理论分析是为了显示SPCA的合理性。对大众数据进行的广泛实验表明,拟议的方法可以大大改善艺术结果的状况。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

鲁棒表示学习简述

专知会员服务

26+阅读 · 2021年4月13日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日