Fan et al. [$\mathit{Annals}$ $\mathit{of}$ $\mathit{Statistics}$ $\textbf{47}$(6) (2019) 3009-3031] proposed a distributed principal component analysis (PCA) algorithm to significantly reduce the communication cost between multiple servers. In this paper, we robustify their distributed algorithm by using robust covariance matrix estimators respectively proposed by Minsker [$\mathit{Annals}$ $\mathit{of}$ $\mathit{Statistics}$ $\textbf{46}$(6A) (2018) 2871-2903] and Ke et al. [$\mathit{Statistical}$ $\mathit{Science}$ $\textbf{34}$(3) (2019) 454-471] instead of the sample covariance matrix. We extend the deviation bound of robust covariance estimators with bounded fourth moments to the case of the heavy-tailed distribution under only bounded $2+\epsilon$ moments assumption. The theoretical results show that after the shrinkage or truncation treatment for the sample covariance matrix, the statistical error rate of the final estimator produced by the robust algorithm is the same as that of sub-Gaussian tails, when $\epsilon \geq 2$ and the sampling distribution is symmetric innovation. While $2 > \epsilon >0$, the rate with respect to the sample size of each server is slower than that of the bounded fourth moment assumption. Extensive numerical results support the theoretical analysis, and indicate that the algorithm performs better than the original distributed algorithm and is robust to heavy-tailed data and outliers.


翻译:[$\mathit{Annals}$\mathit{$\mathit}$$\mathit}$$\mathit{Statistic}$$\mathit{Sdrig}$(2019) 3009-3031] 提出一个分布式主元分析(PCA)算法,以大幅降低多个服务器之间的通信成本。在本文中,我们通过使用明斯克[$\mathit{Annals}$\mathit}$\mathatit}}$$\mathitat}$\tathmatistic}$\statististics}$$\stextbf{46}$(6A) (2018 2871-2903}) 来强化其分布式的算法。我们把强型计算计算器的偏差范围扩大到第四次约束式的重尾部分配, 仅由2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

0
下载
关闭预览

相关内容

Python分布式计算,171页pdf,Distributed Computing with Python
专知会员服务
108+阅读 · 2020年5月3日
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
【推荐】SLAM相关资源大列表
机器学习研究会
10+阅读 · 2017年8月18日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
VIP会员
相关资讯
Disentangled的假设的探讨
CreateAMind
9+阅读 · 2018年12月10日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
【推荐】SLAM相关资源大列表
机器学习研究会
10+阅读 · 2017年8月18日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Top
微信扫码咨询专知VIP会员