分布主要元件分析的线性一致算法 (A Linearly Convergent Algorithm for Distributed Principal Component Analysis)

Principal Component Analysis (PCA) is the workhorse tool for dimensionality reduction in this era of big data. While often overlooked, the purpose of PCA is not only to reduce data dimensionality, but also to yield features that are uncorrelated. Furthermore, the ever-increasing volume of data in the modern world often requires storage of data samples across multiple machines, which precludes the use of centralized PCA algorithms. This paper focuses on the dual objective of PCA, namely, dimensionality reduction and decorrelation of features, but in a distributed setting. This requires estimating the eigenvectors of the data covariance matrix, as opposed to only estimating the subspace spanned by the eigenvectors, when data is distributed across a network of machines. Although a few distributed solutions to the PCA problem have been proposed recently, convergence guarantees and/or communications overhead of these solutions remain a concern. With an eye towards communications efficiency, this paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA) that estimates the eigenvectors of the data covariance matrix when data is distributed across an undirected and arbitrarily connected network of machines. Furthermore, the proposed algorithm is shown to converge linearly to a neighborhood of the true solution. Numerical results are also provided to demonstrate the efficacy of the proposed solution.

翻译：元件分析( PCA) 是当前海量数据时代减少维度的工具。尽管经常被忽略, 五氯苯甲醚的目的不仅在于减少数据维度, 也在于产生与数据不相干的特点。此外, 现代世界中数据数量不断增加, 往往需要通过多个机器储存数据样本, 从而排除使用中央化的五氯苯甲醚算法。本文侧重于五氯苯甲醚的双重目标, 即维度减少和特性的变异性, 而在分布式环境中。这要求估算数据变异矩阵的元体, 而不是仅仅估算在数据分布于机器网络之间时, 仅估算数据元子空间所跨越的子空间。尽管最近提出了少量分散的关于五氯苯甲醚问题的解决方案, 但趋同保证和/ 或这些解决方案的通信间接费用仍然令人关切。以通信效率为视角, 本文介绍了一种反馈式的神经网络分布式计算法, 称为分布式Sanger Algoithm( DSA), 而不是仅仅估算源源源数的分解的子空间间隔空间,, 并且显示的模型显示的直线式计算结果的模型, 显示的正确的矩阵方向的模型显示, 方向的模型显示, 方向的模型显示, 方向的模型显示的路径的模型显示, 方向的路径的模型显示, 方向式矩阵的模型显示的模型显示的路径矩阵。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日