快速且容错的分布式SGD算法与降低计算负担 (Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load) - 专知论文

会员服务 ·

0

SGD · 收敛速度 · 计算外包 · 算法 · 计算时间 ·

2023 年 4 月 17 日

Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

翻译：快速且容错的分布式SGD算法与降低计算负担

Maximilian Egger,Serge Kas Hanna,Rawad Bitar

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers, that otherwise degrade the benefit of outsourcing the computation. This can be done by only waiting for a subset of the workers to finish their computation at each iteration of the algorithm. Previous works proposed to adapt the number of workers to wait for as the algorithm evolves to optimize the speed of convergence. In contrast, we model the communication and computation times using independent random variables. Considering this model, we construct a novel scheme that adapts both the number of workers and the computation load throughout the run-time of the algorithm. Consequently, we improve the convergence speed of distributed SGD while significantly reducing the computation load, at the expense of a slight increase in communication load.

翻译：在分布式机器学习中，中央节点会将计算量较大的任务分发给外部计算节点。优化程序（如随机梯度下降SGD）的性质可以用来缓解计算节点响应缓慢或未响应的问题，这些节点被称为“Stragglers”，不然的话无法获得计算外包的益处。这可以通过在每一次迭代中仅等待一个子集的计算节点完成其计算来实现。以前的工作提出了随着算法的演化来调整等待计算节点数量的方法以优化收敛速度。相反，我们使用独立的随机变量对通信和计算时间进行建模。基于这个模型，我们构建了一种新的方案，通过动态调整计算节点数量和计算负担来提高分布式SGD的收敛速度，同时显著降低计算负担，但会稍微增加通信负荷。

0

相关内容

SGD

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

低保守性自适应鲁棒优化及其在含大规模风电电网调度中的应用

国家自然科学基金

1+阅读 · 2015年12月31日

基于时间和优先权约束的柔性制造系统控制器设计

国家自然科学基金

0+阅读 · 2013年12月31日

几类无线通信中的非凸矩阵优化问题及算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

实时系统的非剥夺资源共享和分层调度

国家自然科学基金

0+阅读 · 2012年12月31日

分数阶动力学系统的精细积分算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向智能电网的多目标优化竞价发电建模及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

社会计算问题与逆变分不等式求解

国家自然科学基金

0+阅读 · 2009年12月31日

基于高性能集群计算的围棋机器博弈关键算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

BC图多处理器网络类中基于限制故障集条件下的可靠单播和广播研究

国家自然科学基金

0+阅读 · 2008年12月31日

Subject Membership Inference Attacks in Federated Learning

Arxiv

0+阅读 · 2023年6月2日

Numerical Solution of HCIR Equation with Transaction Costs using Alternating Direction Implicit Method

Arxiv

0+阅读 · 2023年6月2日

When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods

Arxiv

0+阅读 · 2023年6月2日

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

Arxiv

0+阅读 · 2023年6月2日

Semiparametric efficient estimation of genetic relatedness with machine learning methods

Arxiv

0+阅读 · 2023年6月2日

CRS-FL: Conditional Random Sampling for Communication-Efficient and Privacy-Preserving Federated Learning

Arxiv

0+阅读 · 2023年6月1日

Surrogate Model Extension (SME): A Fast and Accurate Weight Update Attack on Federated Learning

Arxiv

0+阅读 · 2023年5月31日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

VIP会员

文章信息

相关主题

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

【百度】-大规模深度学习广告系统的分布式分层GPU参数服务器，Distributed Hierarchical GPU PS

专知会员服务

24+阅读 · 2020年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Subject Membership Inference Attacks in Federated Learning

Arxiv

0+阅读 · 2023年6月2日

Numerical Solution of HCIR Equation with Transaction Costs using Alternating Direction Implicit Method

Arxiv

0+阅读 · 2023年6月2日

When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods

Arxiv

0+阅读 · 2023年6月2日

Decentralized Online Regularized Learning Over Random Time-Varying Graphs

Arxiv

0+阅读 · 2023年6月2日

Semiparametric efficient estimation of genetic relatedness with machine learning methods

Arxiv

0+阅读 · 2023年6月2日

CRS-FL: Conditional Random Sampling for Communication-Efficient and Privacy-Preserving Federated Learning

Arxiv

0+阅读 · 2023年6月1日

Surrogate Model Extension (SME): A Fast and Accurate Weight Update Attack on Federated Learning

Arxiv

0+阅读 · 2023年5月31日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

相关基金

低保守性自适应鲁棒优化及其在含大规模风电电网调度中的应用

国家自然科学基金

1+阅读 · 2015年12月31日

基于时间和优先权约束的柔性制造系统控制器设计

国家自然科学基金

0+阅读 · 2013年12月31日

几类无线通信中的非凸矩阵优化问题及算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

实时系统的非剥夺资源共享和分层调度

国家自然科学基金

0+阅读 · 2012年12月31日

分数阶动力学系统的精细积分算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向智能电网的多目标优化竞价发电建模及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

社会计算问题与逆变分不等式求解

国家自然科学基金

0+阅读 · 2009年12月31日

基于高性能集群计算的围棋机器博弈关键算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

BC图多处理器网络类中基于限制故障集条件下的可靠单播和广播研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员