使用大棒和同步更新的可缩放分布学习关键参数 (Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates) - 专知论文

会员服务 ·

0

Batch Size · 评论员 · 学成 · 可辨认的 · 饱和 ·

2021 年 3 月 3 日

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

翻译：使用大棒和同步更新的可缩放分布学习关键参数

Sebastian U. Stich,Amirkeivan Mohtashami,Martin Jaggi

from arxiv, Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness. Especially, it has been observed that the speedup saturates beyond a certain batch size and/or when the delays grow too large. We identify a data-dependent parameter that explains the speedup saturation in both these settings. Our comprehensive theoretical analysis, for strongly convex, convex and non-convex settings, unifies and generalized prior work directions that often focused on only one of these two aspects. In particular, our approach allows us to derive improved speedup results under frequently considered sparsity assumptions. Our insights give rise to theoretically based guidelines on how the learning rates can be adjusted in practice. We show that our results are tight and illustrate key findings in numerical experiments.

翻译：实验发现,使用随机梯度(SGD)进行分布式培训的效率决定性地取决于批量大小,(在不同步执行中)取决于梯度的坡度。特别是,人们观察到,超过一定批量大小的加速饱和度和(或)当延迟过大时,这种饱和度超过了一定批量大小和(或)延迟过大。我们确定了一个数据依赖参数,可以解释这两种环境的加速饱和度。我们的全面理论分析,对于强电流、电流和非电流设置,我们统一和笼统的先前工作方向,通常只关注这两个方面中的一个方面。特别是,我们的方法使我们能够在经常考虑的宽度假设下取得更好的加速结果。我们的洞察力产生了关于在实践中如何调整学习率的理论性指导方针。我们显示,我们的成果是紧凑的,并展示了数字实验中的关键结果。

0

相关内容

Batch Size

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Designing a scalable framework for declarative automation on distributed systems

Designing a scalable framework for declarative automation on distributed systems

Arxiv

0+阅读 · 2021年4月27日

One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning

Arxiv

0+阅读 · 2021年4月27日

Generalized ADMM in Distributed Learning via Variational Inequality

Generalized ADMM in Distributed Learning via Variational Inequality

Arxiv

0+阅读 · 2021年4月26日

DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training

Arxiv

1+阅读 · 2021年4月24日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Scalable Generalized Dynamic Topic Models

Arxiv

7+阅读 · 2018年3月21日

VIP会员

文章信息

相关主题

相关VIP内容

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Designing a scalable framework for declarative automation on distributed systems

Designing a scalable framework for declarative automation on distributed systems

Arxiv

0+阅读 · 2021年4月27日

One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning

Arxiv

0+阅读 · 2021年4月27日

Generalized ADMM in Distributed Learning via Variational Inequality

Generalized ADMM in Distributed Learning via Variational Inequality

Arxiv

0+阅读 · 2021年4月26日

DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training

Arxiv

1+阅读 · 2021年4月24日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Arxiv

12+阅读 · 2021年2月21日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Scalable Generalized Dynamic Topic Models

Arxiv

7+阅读 · 2018年3月21日

微信扫码咨询专知VIP会员