Fan et al. [$\mathit{Annals}$ $\mathit{of}$ $\mathit{Statistics}$ $\textbf{47}$(6) (2019) 3009-3031] constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm's guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhances the effectiveness of their distributed PCA algorithm by utilizing robust covariance matrix estimators of Minsker [$\mathit{Annals}$ $\mathit{of}$ $\mathit{Statistics}$ $\textbf{46}$(6A) (2018) 2871-2903] and Ke et al. [$\mathit{Statistical}$ $\mathit{Science}$ $\textbf{34}$(3) (2019) 454-471] to tame heavy-tailed data. The theoretical results demonstrate that when the sampling distribution is symmetric innovation with the bounded fourth moment or asymmetric with the finite $6$-th moment, the statistical error rate of the final estimator produced by the robust algorithm is similar to that of sub-Gaussian tails. Extensive numerical trials support the theoretical analysis and indicate that our algorithm is robust to heavy-tailed data and outliers.
翻译:Fan et al. [$\\mathit{Annals}$\mathit{统计学}$$(mathit{统计学}$$\mathit{统计学}$$\\textbf{47}$(6) (2019) 3009-3031]] 建构了一个分散主要组成部分分析(PCA)算法,以大幅降低多个服务器之间的通信成本。然而,它们的算法保证仅用于亚伽西数据。由于这一缺陷,本文件利用明斯克[$\mathit{统计学}美元[$\mathit{统计学] 的坚固的基数矩阵估测算器[$\mathatit{统计学} 和Ke etal al. [$\mathitatitt{统计学}$\ textbf*(3) (2019) 454-471] 和 tame 重整数据,提高了其分布的功效。理论结果表明,当取样的配置支持值分布是精确的基数级数字,而精确的基数级数据是精确的基数级分析,最终的基数级数据是精确的。