用于受污染椭圆石分布分布的强有力的自调半参数五氯苯甲醚 (Robust self-tuning semiparametric PCA for contaminated elliptical distribution)

Principal component analysis (PCA) is one of the most popular dimension reduction methods. The usual PCA is known to be sensitive to the presence of outliers, and thus many robust PCA methods have been developed. Among them, the Tyler's M-estimator is shown to be the most robust scatter estimator under the elliptical distribution. However, when the underlying distribution is contaminated and deviates from ellipticity, Tyler's M-estimator might not work well. In this article, we apply the semiparametric theory to propose a robust semiparametric PCA. The merits of our proposal are twofold. First, it is robust to heavy-tailed elliptical distributions as well as robust to non-elliptical outliers. Second, it pairs well with a data-driven tuning procedure, which is based on active ratio and can adapt to different degrees of data outlyingness. Theoretical properties are derived, including the influence functions for various statistical functionals and asymptotic normality. Simulation studies and a data analysis demonstrate the superiority of our method.

翻译：主要部件分析(PCA)是最受欢迎的减少维度方法之一。众所周知, 普通的五氯苯甲醚对外部离子的存在十分敏感, 因此已经开发了许多稳健的五氯苯甲醚方法。其中, 泰勒的M- 估计器被证明是椭圆分布下最强大的散射估计器。但是, 当底部分布被污染并偏离了椭圆性时, 泰勒的M- 估计器可能效果不好。在本条中, 我们应用半参数理论来提议一个强大的半参数五氯苯甲醚。我们提案的优点是双重的。首先, 它对重尾的椭圆分布非常有力, 并且对非椭圆的外部分布也很有力。其次, 它与数据驱动的调控程序相匹配, 以活跃比率为基础, 可以适应不同程度的数据偏差。理论特性产生, 包括各种统计功能的影响功能和无症状常态性。模拟研究和数据分析显示了我们方法的优势。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日