Moshpit SGD: 关于异基因不可靠装置的通信-高效分散化培训 (Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices) - 专知论文

会员服务 ·

0

SGD · Networking · contrastive · 联邦学习 · Neural Networks ·

2021 年 12 月 2 日

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

翻译：Moshpit SGD: 关于异基因不可靠装置的通信-高效分散化培训

Max Ryabinin,Eduard Gorbunov,Vsevolod Plokhotnyuk,Gennady Pekhimenko

from arxiv, Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2021. 50 pages, 6 figures. Code: https://github.com/yandex-research/moshpit-sgd

Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many real-world applications, such as federated learning and cloud-based distributed training, operate on unreliable devices with unstable network bandwidth. As a result, these applications are restricted to using parameter servers or gossip-based averaging protocols. In this work, we lift that restriction by proposing Moshpit All-Reduce - an iterative averaging protocol that exponentially converges to the global average. We demonstrate the efficiency of our protocol for distributed optimization with strong theoretical guarantees. The experiments show 1.3x speedup for ResNet-50 training on ImageNet compared to competitive gossip-based strategies and 1.5x speedup when training ALBERT-large from scratch using preemptible compute nodes.

翻译：大型数据集的深度神经网络培训通常可以通过多种计算节点加速。这种方法被称为分布式培训,可以通过专用信息传输协议(如环全环-环环-环环-环-环-环-环-环-环-环-环)使用数百台计算机。然而,大规模运行这些协议需要可靠的高速网络,而只有专门集群才能提供这种网络。相比之下,许多现实世界应用软件,如联合学习和云传播培训,都以不稳定网络带宽的不可靠设备运作。因此,这些应用软件仅限于使用参数服务器或八卦平均协议。在这项工作中,我们通过提出Mushpit All-Reduce(即一个与全球平均值成倍一致的迭接轨平均协议)来取消这一限制。我们展示了我们协议在分布优化方面的效率,并提供了强有力的理论保证。实验显示,与竞争性八卦策略相比,图像网络Res-50培训速度为1.3x,在培训ALBERT从抓起时,用1.5x速度为1.5x速度。

0

相关内容

SGD

【IJCAI2021】基于梯度投影的联邦学习公平性算法

专知会员服务

27+阅读 · 2021年5月9日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

2020的机器学习在研究什么？请看最新8篇ICML2020投稿论文：自监督学习、联邦学习、图学习、数据隐私、语言模型、终身学习……

2020的机器学习在研究什么？请看最新8篇ICML2020投稿论文：自监督学习、联邦学习、图学习、数据隐私、语言模型、终身学习……

专知会员服务

65+阅读 · 2020年2月21日

【NUS】神经问题生成的最近进展（Recent Advances in Neural Question Generation）

【NUS】神经问题生成的最近进展（Recent Advances in Neural Question Generation）

专知会员服务

16+阅读 · 2019年12月22日

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond， workshop Ⅳ： Deep Geometric Learning of Big Data and Applications

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond， workshop Ⅳ： Deep Geometric Learning of Big Data and Applications

专知会员服务

19+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

ICLR 2020会议的16篇最佳深度学习论文

ICLR 2020会议的16篇最佳深度学习论文

AINLP

5+阅读 · 2020年5月12日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

已删除

将门创投

3+阅读 · 2017年10月12日

RoFL: Attestable Robustness for Secure Federated Learning

RoFL: Attestable Robustness for Secure Federated Learning

Arxiv

0+阅读 · 2022年2月3日

Byzantine-Robust Decentralized Learning via Self-Centered Clipping

Arxiv

0+阅读 · 2022年2月3日

Comparative assessment of federated and centralized machine learning

Arxiv

0+阅读 · 2022年2月3日

Data Heterogeneity-Robust Federated Learning via Group Client Selection in Industrial IoT

Arxiv

0+阅读 · 2022年2月3日

Make Some Noise: Reliable and Efficient Single-Step Adversarial Training

Arxiv

0+阅读 · 2022年2月2日

Asynchronous Decentralized Learning over Unreliable Wireless Networks

Arxiv

0+阅读 · 2022年2月2日

On the Effect of Log-Barrier Regularization in Decentralized Softmax Gradient Play in Multiagent Systems

Arxiv

0+阅读 · 2022年2月2日

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Arxiv

4+阅读 · 2021年6月18日

Federated Learning with Fair Averaging

Arxiv

7+阅读 · 2021年4月30日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【IJCAI2021】基于梯度投影的联邦学习公平性算法

专知会员服务

27+阅读 · 2021年5月9日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

2020的机器学习在研究什么？请看最新8篇ICML2020投稿论文：自监督学习、联邦学习、图学习、数据隐私、语言模型、终身学习……

2020的机器学习在研究什么？请看最新8篇ICML2020投稿论文：自监督学习、联邦学习、图学习、数据隐私、语言模型、终身学习……

专知会员服务

65+阅读 · 2020年2月21日

【NUS】神经问题生成的最近进展（Recent Advances in Neural Question Generation）

【NUS】神经问题生成的最近进展（Recent Advances in Neural Question Generation）

专知会员服务

16+阅读 · 2019年12月22日

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond， workshop Ⅳ： Deep Geometric Learning of Big Data and Applications

【IPAM workshops】加州大学洛杉矶分校会议：Geometry and Learning from Data in 3D and Beyond， workshop Ⅳ： Deep Geometric Learning of Big Data and Applications

专知会员服务

19+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

ICLR 2020会议的16篇最佳深度学习论文

ICLR 2020会议的16篇最佳深度学习论文

AINLP

5+阅读 · 2020年5月12日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

已删除

将门创投

3+阅读 · 2017年10月12日

相关论文

RoFL: Attestable Robustness for Secure Federated Learning

RoFL: Attestable Robustness for Secure Federated Learning

Arxiv

0+阅读 · 2022年2月3日

Byzantine-Robust Decentralized Learning via Self-Centered Clipping

Arxiv

0+阅读 · 2022年2月3日

Comparative assessment of federated and centralized machine learning

Arxiv

0+阅读 · 2022年2月3日

Data Heterogeneity-Robust Federated Learning via Group Client Selection in Industrial IoT

Arxiv

0+阅读 · 2022年2月3日

Make Some Noise: Reliable and Efficient Single-Step Adversarial Training

Arxiv

0+阅读 · 2022年2月2日

Asynchronous Decentralized Learning over Unreliable Wireless Networks

Arxiv

0+阅读 · 2022年2月2日

On the Effect of Log-Barrier Regularization in Decentralized Softmax Gradient Play in Multiagent Systems

Arxiv

0+阅读 · 2022年2月2日

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Arxiv

4+阅读 · 2021年6月18日

Federated Learning with Fair Averaging

Arxiv

7+阅读 · 2021年4月30日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

微信扫码咨询专知VIP会员