分布主要元件分析的线性一致算法 (A Linearly Convergent Algorithm for Distributed Principal Component Analysis)

Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. This paper focuses on this dual objective of PCA, namely, dimensionality reduction and decorrelation of features, which requires estimating the eigenvectors of a data covariance matrix, as opposed to only estimating the subspace spanned by the eigenvectors. The ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the use of centralized PCA algorithms. Although a few distributed solutions to the PCA problem have been proposed recently, convergence guarantees and/or communications overhead of these solutions remain a concern. With an eye towards communications efficiency, this paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA) that estimates the eigenvectors of a data covariance matrix when data are distributed across an undirected and arbitrarily connected network of machines. Furthermore, the proposed algorithm is shown to converge linearly to a neighborhood of the true solution. Numerical results are also shown to demonstrate the efficacy of the proposed solution.

翻译：虽然经常被忽视,但五氯苯甲醚的目的不仅在于减少数据维度,而且在于产生与数据不相干的特点。本文件侧重于五氯苯甲醚的双重目标,即维度减少和特性变形,这要求估计数据共变矩阵的外形元体,而不是仅仅估计从源子中分流的子空间范围。现代世界数据数量不断增加,往往需要多台机器储存数据样本,这排除了中央化的五氯苯甲醚算法的使用。尽管最近提出了少量分散的五氯苯甲醚问题解决方案,但趋同保障和/或这些解决方案的间接通信仍然令人关切。为了提高通信效率,本文件介绍了一种以一个时间尺度为基础的向向上神经网络的进料计算法,代之为分流的Sanger Algorithm(DSA),它估计了数据维度矩阵的代数,而当数据分布于一个非直接和任意连接的解算法区域时,所拟议的数据串联式结果也展示了向真实的解算法网络展示。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

专知会员服务

99+阅读 · 2020年7月6日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【中科大】大数据算法（2020年春季）

专知会员服务

83+阅读 · 2020年5月16日