分发培训压缩通信:适应方法和系统 (Compressed Communication for Distributed Training: Adaptive Methods and System) - 专知论文

会员服务 ·

0

可约的 · 分布式机器学习 · Performer · 可理解性 · 模型评估 ·

2021 年 5 月 17 日

Compressed Communication for Distributed Training: Adaptive Methods and System

翻译：分发培训压缩通信:适应方法和系统

Yuchen Zhong,Cong Xie,Shuai Zheng,Haibin Lin

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training. However, there is little understanding of applying gradient compression to adaptive gradient methods. Moreover, its performance benefits are often limited by the non-negligible compression overhead. In this paper, we first introduce a novel adaptive gradient method with gradient compression. We show that the proposed method has a convergence rate of $\mathcal{O}(1/\sqrt{T})$ for non-convex problems. In addition, we develop a scalable system called BytePS-Compress for two-way compression, where the gradients are compressed in both directions between workers and parameter servers. BytePS-Compress pipelines the compression and decompression on CPUs and achieves a high degree of parallelism. Empirical evaluations show that we improve the training time of ResNet50, VGG16, and BERT-base by 5.0%, 58.1%, 23.3%, respectively, without any accuracy loss with 25 Gb/s networking. Furthermore, for training the BERT models, we achieve a compression rate of 333x compared to the mixed-precision training.

翻译：最近,人们越来越有兴趣使用梯度压缩来减少分布式培训的通信费。然而,对于将梯度压缩应用到适应性梯度方法,人们几乎没有什么了解。此外,其性能效益往往受到非忽略性压缩管理的限制。在本文中,我们首先采用一个新的适应性梯度方法,采用梯度压缩。我们显示,拟议方法对于非convex问题,其趋同率为$\mathcal{O}(1/\ sqrt{T})$。此外,我们开发了一个称为双向压缩的BytePS-Compress(ByPS-Compress)(双向压缩)系统,在工人和参数服务器之间双向压缩梯度。BytePS-Compress(ByPS-Compress)管道对CPUs进行压缩和降压,并实现高度的平行。我们提出的方法显示,我们分别将ResNet50、VGGGG16和BERT-B的训练时间提高5.0%、58.1%、23.3%(我们没有因25 Gb/Cisx的混合培训而使BER-RMLMLA而降低的训练率提高。

0

相关内容

可约的

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【干货书】深度学习Pytorch快速入门，150页pdf，Deep Learning with PyTorch

【干货书】深度学习Pytorch快速入门，150页pdf，Deep Learning with PyTorch

专知会员服务

156+阅读 · 2021年4月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

carla无人驾驶模拟中文项目 carla_simulator_Chinese

carla无人驾驶模拟中文项目 carla_simulator_Chinese

CreateAMind

3+阅读 · 2018年1月30日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

In-Network Learning: Distributed Training and Inference in Networks

Arxiv

0+阅读 · 2021年7月7日

BAGUA: Scaling up Distributed Learning with System Relaxations

Arxiv

0+阅读 · 2021年7月6日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Attributed Network Embedding via Subspace Discovery

Arxiv

4+阅读 · 2019年1月14日

VIP会员

文章信息

相关主题

分布式机器学习

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【干货书】深度学习Pytorch快速入门，150页pdf，Deep Learning with PyTorch

【干货书】深度学习Pytorch快速入门，150页pdf，Deep Learning with PyTorch

专知会员服务

156+阅读 · 2021年4月4日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

carla无人驾驶模拟中文项目 carla_simulator_Chinese

carla无人驾驶模拟中文项目 carla_simulator_Chinese

CreateAMind

3+阅读 · 2018年1月30日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

In-Network Learning: Distributed Training and Inference in Networks

Arxiv

0+阅读 · 2021年7月7日

BAGUA: Scaling up Distributed Learning with System Relaxations

Arxiv

0+阅读 · 2021年7月6日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Attributed Network Embedding via Subspace Discovery

Arxiv

4+阅读 · 2019年1月14日

微信扫码咨询专知VIP会员