This article studies the robustness of the eigenvalue ordering, an important issue when estimating the leading eigen-subspace by principal component analysis (PCA). In Yata and Aoshima (2010), cross-data-matrix PCA (CDM-PCA) was proposed and shown to have smaller bias than PCA in estimating eigenvalues. While CDM-PCA has the potential to achieve better estimation of the leading eigen-subspace than the usual PCA, its robustness is not well recognized. In this article, we first develop a more stable variant of CDM-PCA, which we call product-PCA (PPCA), that provides a more convenient formulation for theoretical investigation. Secondly, we prove that, in the presence of outliers, PPCA is more robust than PCA in maintaining the correct ordering of leading eigenvalues. The robustness gain in PPCA comes from the random data partition, and it does not rely on a data down-weighting scheme as most robust statistical methods do. This enables us to establish the surprising finding that, when there are no outliers, PPCA and PCA share the same asymptotic distribution. That is, the robustness gain of PPCA in estimating the leading eigen-subspace has no efficiency loss in comparison with PCA. Simulation studies and a face data example are presented to show the merits of PPCA. In conclusion, PPCA has a good potential to replace the role of the usual PCA in real applications whether outliers are present or not.
翻译:文章研究了乙基值定单的稳健性,这是用主要组成部分分析来估计领先的乙基次空间时的一个重要问题。在Yata和Aoshima(2010年)中,提出了跨数据矩阵五氯苯甲(CDM-PCA),并显示在估算乙基值时,其偏差小于CPA。虽然CDM-PCA有可能实现对领先的乙基次空间的更好的估计,但其稳健性并没有得到很好的认识。在本条中,我们首先开发了一个更稳定的清洁发展机制-PCA(CDA)变量,我们称之为产品-PCA(PPCA),为理论调查提供了更方便的配方。第二,我们证明,在有外部用户在场的情况下,PPCA(CD-PCA)比CPA(CA)更能维持领先性定值的正确排序。尽管清洁发展机制-PCA的稳健性收益来自随机数据分割,但并不象最可靠的统计方法那样依赖数据下加权制度。这使我们可以得出一个令人惊讶的发现,当没有外端点时,PPCA和CPAA的正版面应用比标准A(PPA)的准确性分析是否具有正确性价值分布。