Principal component analysis (PCA) is one of the most popular dimension reduction methods. The usual PCA is known to be sensitive to the presence of outliers, and thus many robust PCA methods have been developed. Among them, the Tyler's M-estimator is shown to be the most robust scatter estimator under the elliptical distribution. However, when the underlying distribution is contaminated and deviates from ellipticity, Tyler's M-estimator might not work well. In this article, we apply the semiparametric theory to propose a robust semiparametric PCA. The merits of our proposal are twofold. First, it is robust to heavy-tailed elliptical distributions as well as robust to non-elliptical outliers. Second, it pairs well with a data-driven tuning procedure, which is based on active ratio and can adapt to different degrees of data outlyingness. Theoretical properties are derived, including the influence functions for various statistical functionals and asymptotic normality. Simulation studies and a data analysis demonstrate the superiority of our method.
翻译:主要部件分析(PCA)是最受欢迎的减少维度方法之一。 众所周知, 普通的五氯苯甲醚对外部离子的存在十分敏感, 因此已经开发了许多稳健的五氯苯甲醚方法。 其中, 泰勒的M- 估计器被证明是椭圆分布下最强大的散射估计器。 但是, 当底部分布被污染并偏离了椭圆性时, 泰勒的M- 估计器可能效果不好。 在本条中, 我们应用半参数理论来提议一个强大的半参数五氯苯甲醚。 我们提案的优点是双重的。 首先, 它对重尾的椭圆分布非常有力, 并且对非椭圆的外部分布也很有力。 其次, 它与数据驱动的调控程序相匹配, 以活跃比率为基础, 可以适应不同程度的数据偏差。 理论特性产生, 包括各种统计功能的影响功能和无症状常态性。 模拟研究和数据分析显示了我们方法的优势。