S2 减少: 快速分配深层学习的高性能差分通信 (S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning) - 专知论文

会员服务 ·

0

可约的 · 稀疏 · SGD · state-of-the-art · Extensibility ·

2021 年 10 月 5 日

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning

翻译：S2 减少: 快速分配深层学习的高性能差分通信

Keshi Ge,Yongquan Fu,Zhiquan Lai,Xiaoge Deng,Dongsheng Li

from arxiv, 8 pages

Distributed stochastic gradient descent (SGD) approach has been widely used in large-scale deep learning, and the gradient collective method is vital to ensure the training scalability of the distributed deep learning system. Collective communication such as AllReduce has been widely adopted for the distributed SGD process to reduce the communication time. However, AllReduce incurs large bandwidth resources while most gradients are sparse in many cases since many gradient values are zeros and should be efficiently compressed for bandwidth saving. To reduce the sparse gradient communication overhead, we propose Sparse-Sketch Reducer (S2 Reducer), a novel sketch-based sparse gradient aggregation method with convergence guarantees. S2 Reducer reduces the communication cost by only compressing the non-zero gradients with count-sketch and bitmap, and enables the efficient AllReduce operators for parallel SGD training. We perform extensive evaluation against four state-of-the-art methods over five training models. Our results show that S2 Reducer converges to the same accuracy, reduces 81\% sparse communication overhead, and achieves 1.8$ \times $ speedup compared to state-of-the-art approaches.

翻译：在大型深层学习中广泛采用分布式梯度梯度下降(SGD)方法,而梯度集体方法对于确保分布式深层学习系统的培训可扩展性至关重要; 分散式 SGD 进程广泛采用AllRedue等集体通信,以减少通信时间; 然而, AllReduce 产生大型带宽资源,而由于许多梯度值为零,在许多情况下,大多数梯度是稀疏的,应当为节省带宽而有效压缩; 为了减少稀薄的梯度通信间接费用,我们建议采用基于草图的稀释式稀释梯度汇总法(S2递减器),这是一种具有趋同保证的新颖的、基于草图的稀释梯度汇总法。 S2 降低通信成本,只需用点数和位图压缩非零梯度梯度来压缩非零梯度梯度的通信费用,使高效的Alluceuse操作员能够进行平行SGD培训。我们对五个培训模式的四种最先进的方法进行了广泛的评价。我们的结果表明,S2 降低器与相同的精度接近于相同的精度,减少了81 ⁇ 稀释式通信间接费用,减少了81 ⁇ 稀释式通信间接费用,并达到1.8美元。

0

相关内容

可约的

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

【图机器学习论文】综述：图嵌入技术、应用和性能（Graph Embedding Techniques, Applications, and Performance: A Survey）

【图机器学习论文】综述：图嵌入技术、应用和性能（Graph Embedding Techniques, Applications, and Performance: A Survey）

专知会员服务

73+阅读 · 2019年12月16日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

深度神经网络模型压缩与加速综述

深度神经网络模型压缩与加速综述

专知会员服务

129+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知

6+阅读 · 2020年1月16日

已删除

将门创投

9+阅读 · 2019年11月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Fed2: Feature-Aligned Federated Learning

Arxiv

0+阅读 · 2021年11月28日

Resource-Aware Asynchronous Online Federated Learning for Nonlinear Regression

Arxiv

0+阅读 · 2021年11月27日

Dynamic Network-Assisted D2D-Aided Coded Distributed Learning

Arxiv

0+阅读 · 2021年11月26日

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年11月25日

BAGUA: Scaling up Distributed Learning with System Relaxations

Arxiv

0+阅读 · 2021年11月25日

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Arxiv

4+阅读 · 2021年6月18日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

【图机器学习论文】综述：图嵌入技术、应用和性能（Graph Embedding Techniques, Applications, and Performance: A Survey）

【图机器学习论文】综述：图嵌入技术、应用和性能（Graph Embedding Techniques, Applications, and Performance: A Survey）

专知会员服务

73+阅读 · 2019年12月16日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

深度神经网络模型压缩与加速综述

深度神经网络模型压缩与加速综述

专知会员服务

129+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知

6+阅读 · 2020年1月16日

已删除

将门创投

9+阅读 · 2019年11月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Fed2: Feature-Aligned Federated Learning

Arxiv

0+阅读 · 2021年11月28日

Resource-Aware Asynchronous Online Federated Learning for Nonlinear Regression

Arxiv

0+阅读 · 2021年11月27日

Dynamic Network-Assisted D2D-Aided Coded Distributed Learning

Arxiv

0+阅读 · 2021年11月26日

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年11月25日

BAGUA: Scaling up Distributed Learning with System Relaxations

Arxiv

0+阅读 · 2021年11月25日

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Arxiv

4+阅读 · 2021年6月18日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

BigDL: A Distributed Deep Learning Framework for Big Data

Arxiv

4+阅读 · 2018年4月16日

微信扫码咨询专知VIP会员