FastSGD:分布式机器学习快速压缩SGD框架 (FastSGD: A Fast Compressed SGD Framework for Distributed Machine Learning) - 专知论文

会员服务 ·

0

可约的 · SGD · 分布式机器学习 · ML · 机器学习建模 ·

2021 年 12 月 8 日

FastSGD: A Fast Compressed SGD Framework for Distributed Machine Learning

翻译：FastSGD:分布式机器学习快速压缩SGD框架

Keyu Yang,Lu Chen,Zhihao Zeng,Yunjun Gao

With the rapid increase of big data, distributed Machine Learning (ML) has been widely applied in training large-scale models. Stochastic Gradient Descent (SGD) is arguably the workhorse algorithm of ML. Distributed ML models trained by SGD involve large amounts of gradient communication, which limits the scalability of distributed ML. Thus, it is important to compress the gradients for reducing communication. In this paper, we propose FastSGD, a Fast compressed SGD framework for distributed ML. To achieve a high compression ratio at a low cost, FastSGD represents the gradients as key-value pairs, and compresses both the gradient keys and values in linear time complexity. For the gradient value compression, FastSGD first uses a reciprocal mapper to transform original values into reciprocal values, and then, it utilizes a logarithm quantization to further reduce reciprocal values to small integers. Finally, FastSGD filters reduced gradient integers by a given threshold. For the gradient key compression, FastSGD provides an adaptive fine-grained delta encoding method to store gradient keys with fewer bits. Extensive experiments on practical ML models and datasets demonstrate that FastSGD achieves the compression ratio up to 4 orders of magnitude, and accelerates the convergence time up to 8x, compared with state-of-the-art methods.

翻译：随着大数据的迅速增长,分布式机器学习(ML)被广泛应用于大型模型的培训中。 SGD 培训的分布式ML 模型涉及大量的梯度通信,这限制了分布式ML的可缩放性。因此,必须压缩梯度以降低通信量。在本文中,我们提议快速SGD,为分布式ML建立一个快速压缩的SGD框架。为了以低成本实现高压缩率,快速SGD 代表梯度作为关键值对配方,并且以线性时间复杂性压缩梯度键和值。对于梯度缩放式键和值,快速SGD首先使用一个对等的映射器将原始值转换成对等值,从而限制分布式ML的对等值。最后,快速SGD过滤器通过一个特定阈值降低梯度整数。对于梯度键压缩而言,快速SGD 提供了一种适应性精细的三角编码方法,以存储梯度键以直线性时间组合。对于梯度键来说,快速SGD首先使用对应的对8号进行对准的快速压缩,然后用快速的缩缩压式,然后用快速压式将模型对准,然后用最快速的缩缩压到加速的模型进行。

0

相关内容

可约的

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

区块链技术前沿报告，32页pdf

专知会员服务

121+阅读 · 2021年3月11日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

最新《分布式机器学习》论文综述最新DML进展，33页pdf

最新《分布式机器学习》论文综述最新DML进展，33页pdf

专知会员服务

121+阅读 · 2019年12月26日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【综述】arXiv最新论文：自动驾驶中深度学习综述，附38页PDF

【综述】arXiv最新论文：自动驾驶中深度学习综述，附38页PDF

专知会员服务

107+阅读 · 2019年10月17日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

专知

9+阅读 · 2018年3月9日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

【深度干货】2017年深度学习优化算法研究亮点最新综述（附slide下载）

【深度干货】2017年深度学习优化算法研究亮点最新综述（附slide下载）

新智元

4+阅读 · 2017年12月6日

Communication-Efficient Distributed Multiple Testing for Large-Scale Inference

Arxiv

0+阅读 · 2022年2月11日

Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment

Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment

Arxiv

0+阅读 · 2022年2月10日

Low Precision Decentralized Distributed Training over IID and non-IID Data

Low Precision Decentralized Distributed Training over IID and non-IID Data

Arxiv

0+阅读 · 2022年2月8日

OneFlow: Redesign the Distributed Deep Learning Framework from Scratch

Arxiv

1+阅读 · 2022年2月8日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Arxiv

11+阅读 · 2020年2月18日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

分布式机器学习

机器学习建模

相关VIP内容

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

区块链技术前沿报告，32页pdf

专知会员服务

121+阅读 · 2021年3月11日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

最新《分布式机器学习》论文综述最新DML进展，33页pdf

最新《分布式机器学习》论文综述最新DML进展，33页pdf

专知会员服务

121+阅读 · 2019年12月26日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【综述】arXiv最新论文：自动驾驶中深度学习综述，附38页PDF

【综述】arXiv最新论文：自动驾驶中深度学习综述，附38页PDF

专知会员服务

107+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

专知

9+阅读 · 2018年3月9日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

【深度干货】2017年深度学习优化算法研究亮点最新综述（附slide下载）

【深度干货】2017年深度学习优化算法研究亮点最新综述（附slide下载）

新智元

4+阅读 · 2017年12月6日

相关论文

Communication-Efficient Distributed Multiple Testing for Large-Scale Inference

Arxiv

0+阅读 · 2022年2月11日

Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment

Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment

Arxiv

0+阅读 · 2022年2月10日

Low Precision Decentralized Distributed Training over IID and non-IID Data

Low Precision Decentralized Distributed Training over IID and non-IID Data

Arxiv

0+阅读 · 2022年2月8日

OneFlow: Redesign the Distributed Deep Learning Framework from Scratch

Arxiv

1+阅读 · 2022年2月8日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Arxiv

11+阅读 · 2020年2月18日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员