高效拜占庭 - 恢复性碎裂梯度 (Efficient Byzantine-Resilient Stochastic Gradient Desce) - 专知论文

会员服务 ·

0

CC · 统计量 · 优化器 · Performer · 随机梯度下降 ·

2021 年 8 月 15 日

Efficient Byzantine-Resilient Stochastic Gradient Desce

翻译：高效拜占庭 - 恢复性碎裂梯度

Kaiyun Li,Xiaojun Chen,Ye Dong,Peng Zhang,Dakui Wang,Shuai Zen

from arxiv, 7pages, 3figures

Distributed Learning often suffers from Byzantine failures, and there have been a number of works studying the problem of distributed stochastic optimization under Byzantine failures, where only a portion of workers, instead of all the workers in a distributed learning system, compute stochastic gradients at each iteration. These methods, albeit workable under Byzantine failures, have the shortcomings of either a sub-optimal convergence rate or high computation cost. To this end, we propose a new Byzantine-resilient stochastic gradient descent algorithm (BrSGD for short) which is provably robust against Byzantine failures. BrSGD obtains the optimal statistical performance and efficient computation simultaneously. In particular, BrSGD can achieve an order-optimal statistical error rate for strongly convex loss functions. The computation complexity of BrSGD is O(md), where d is the model dimension and m is the number of machines. Experimental results show that BrSGD can obtain competitive results compared with non-Byzantine machines in terms of effectiveness and convergence.

翻译：分布式学习常常受到拜占庭失败的困扰,一些研究拜占庭失败下分布式随机优化问题的工作,只有一部分工人,而不是分布式学习系统中的所有工人,在每次迭代中计算随机梯度。这些方法虽然在拜占庭失败下是可行的,但有亚最佳趋同率或高计算成本的缺陷。为此,我们提议采用新的拜占庭抗御型梯度脱底算法(BRSGD,短期),在对付拜占庭失败时,这种算法相当有力。只有一部分工人,而不是分布式学习系统中的所有工人,在每次迭代计算时都计算出蒸气梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度梯度。特别是,布尔占庭可以同时达到一个对强共性损失功能的顺序最佳统计错误率。BRCD的计算复杂度为O(md),其中的模型尺寸为M(md),机器的数量为M(m)。实验结果显示,从有效性和汇合来看,BRCD可以取得与非Byzantine机器相比的竞争结果。

0

相关内容

CC在计算复杂性方面表现突出。它的学科处于数学与计算机理论科学的交叉点，具有清晰的数学轮廓和严格的数学格式。官网链接：https://link.springer.com/journal/37

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【泡泡一分钟】用于平面环境的线性RGBD-SLAM

【泡泡一分钟】用于平面环境的线性RGBD-SLAM

泡泡机器人SLAM

6+阅读 · 2018年12月18日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

On the Double Descent of Random Features Models Trained with SGD

Arxiv

0+阅读 · 2021年10月13日

Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

Arxiv

0+阅读 · 2021年10月12日

Structured Stochastic Gradient MCMC

Arxiv

0+阅读 · 2021年10月12日

Can Stochastic Gradient Langevin Dynamics Provide Differential Privacy for Deep\\ Learning?

Arxiv

0+阅读 · 2021年10月11日

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

Arxiv

0+阅读 · 2021年10月10日

Solon: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients

Arxiv

0+阅读 · 2021年10月9日

Accelerated Gradient Descent Learning over Multiple Access Fading Channels

Accelerated Gradient Descent Learning over Multiple Access Fading Channels

Arxiv

0+阅读 · 2021年10月8日

Combining Differential Privacy and Byzantine Resilience in Distributed SGD

Combining Differential Privacy and Byzantine Resilience in Distributed SGD

Arxiv

0+阅读 · 2021年10月8日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

VIP会员

文章信息

相关主题

随机梯度下降

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

《应用随机微分方程》(Applied Stochastic Differential Equations)324页pdf新书分享

专知会员服务

44+阅读 · 2019年10月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【泡泡一分钟】用于平面环境的线性RGBD-SLAM

【泡泡一分钟】用于平面环境的线性RGBD-SLAM

泡泡机器人SLAM

6+阅读 · 2018年12月18日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

On the Double Descent of Random Features Models Trained with SGD

Arxiv

0+阅读 · 2021年10月13日

Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

Arxiv

0+阅读 · 2021年10月12日

Structured Stochastic Gradient MCMC

Arxiv

0+阅读 · 2021年10月12日

Can Stochastic Gradient Langevin Dynamics Provide Differential Privacy for Deep\\ Learning?

Arxiv

0+阅读 · 2021年10月11日

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

Arxiv

0+阅读 · 2021年10月10日

Solon: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients

Arxiv

0+阅读 · 2021年10月9日

Accelerated Gradient Descent Learning over Multiple Access Fading Channels

Accelerated Gradient Descent Learning over Multiple Access Fading Channels

Arxiv

0+阅读 · 2021年10月8日

Combining Differential Privacy and Byzantine Resilience in Distributed SGD

Combining Differential Privacy and Byzantine Resilience in Distributed SGD

Arxiv

0+阅读 · 2021年10月8日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

微信扫码咨询专知VIP会员