分散式适应梯级方法的趋同 (On the Convergence of Decentralized Adaptive Gradient Methods) - 专知论文

会员服务 ·

0

AdaGrad · MoDELS · prototype · 学成 · Adam ·

2021 年 9 月 7 日

On the Convergence of Decentralized Adaptive Gradient Methods

翻译：分散式适应梯级方法的趋同

Xiangyi Chen,Belhal Karimi,Weijie Zhao,Ping Li

Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization algorithms are rapidly becoming a focal point. With the growth of computing power and the need for using machine learning models on mobile devices, the communication cost of distributed training algorithms needs careful consideration. In this paper, we introduce novel convergent decentralized adaptive gradient methods and rigorously incorporate adaptive gradient methods into decentralized training procedures. Specifically, we propose a general algorithmic framework that can convert existing adaptive gradient methods to their decentralized counterparts. In addition, we thoroughly analyze the convergence behavior of the proposed algorithmic framework and show that if a given adaptive gradient method converges, under some specific conditions, then its decentralized counterpart is also convergent. We illustrate the benefit of our generic decentralized framework on a prototype method, i.e., AMSGrad, both theoretically and numerically.

翻译：适应性梯度方法,包括Adam, AdaGrad, 及其变种,在培训神经网络等深层学习模型方面非常成功。同时,由于需要分布式计算,分布式优化算法正在迅速成为一个协调中心。随着计算能力的增长和在移动设备上使用机器学习模型的需要,分布式培训算法的通信成本需要仔细考虑。在本文中,我们引入了新颖的分散式分散式适应性梯度方法,并将适应性梯度方法严格纳入分散式培训程序。具体地说,我们提议了一个一般算法框架,将现有的适应性梯度方法转换到分散式的对应方。此外,我们透彻分析拟议的算法框架的趋同行为,并表明如果特定适应性梯度方法在某些特定条件下趋于一致,那么分散式的对应方也会趋于一致。我们从理论上和数字上说明了我们通用的分散化框架在原型方法(即AMSGrad)上的好处。

0

相关内容

AdaGrad

【ICML2021】面向增长数据的自适应神经架构

专知会员服务

25+阅读 · 2021年7月8日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

【边缘智能综述论文】A Survey on Edge Intelligence

【边缘智能综述论文】A Survey on Edge Intelligence

专知会员服务

123+阅读 · 2020年3月30日

【论文推荐】张量图卷积网络的多关系和鲁棒学习，Tensor Graph Convolutional Networks for Multi-relational and Robust Learning

【论文推荐】张量图卷积网络的多关系和鲁棒学习，Tensor Graph Convolutional Networks for Multi-relational and Robust Learning

专知会员服务

26+阅读 · 2020年3月19日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

专知会员服务

10+阅读 · 2020年2月12日

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

专知会员服务

67+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

5+阅读 · 2018年11月15日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Stochastic Bias-Reduced Gradient Methods

Arxiv

0+阅读 · 2021年10月28日

A Gradient Method for Multilevel Optimization

Arxiv

0+阅读 · 2021年10月26日

On The Impact of Client Sampling on Federated Learning Convergence

Arxiv

0+阅读 · 2021年10月26日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Adaptive Methods for Real-World Domain Generalization

Arxiv

6+阅读 · 2021年3月30日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Arxiv

3+阅读 · 2019年12月17日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】面向增长数据的自适应神经架构

专知会员服务

25+阅读 · 2021年7月8日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

【边缘智能综述论文】A Survey on Edge Intelligence

【边缘智能综述论文】A Survey on Edge Intelligence

专知会员服务

123+阅读 · 2020年3月30日

【论文推荐】张量图卷积网络的多关系和鲁棒学习，Tensor Graph Convolutional Networks for Multi-relational and Robust Learning

【论文推荐】张量图卷积网络的多关系和鲁棒学习，Tensor Graph Convolutional Networks for Multi-relational and Robust Learning

专知会员服务

26+阅读 · 2020年3月19日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

专知会员服务

10+阅读 · 2020年2月12日

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

专知会员服务

67+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】用于提升含优化层学习的算法与体系结构

【NeurIPS2025】有何不同于过去？基于自监督偏差学习的时空时间序列预测

超越决策优势：情报在创新与适应中的作用

量子计算发展态势研究报告（2025年）

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

5+阅读 · 2018年11月15日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Stochastic Bias-Reduced Gradient Methods

Arxiv

0+阅读 · 2021年10月28日

A Gradient Method for Multilevel Optimization

Arxiv

0+阅读 · 2021年10月26日

On The Impact of Client Sampling on Federated Learning Convergence

Arxiv

0+阅读 · 2021年10月26日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Adaptive Methods for Real-World Domain Generalization

Arxiv

6+阅读 · 2021年3月30日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Arxiv

3+阅读 · 2019年12月17日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

微信扫码咨询专知VIP会员