使用压缩的 SGD 脱落套装点 (Escaping Saddle Points with Compressed SGD) - 专知论文

会员服务 ·

0

SGD · 鞍点 · Lipschitz · 驻点 · 平稳的 ·

2021 年 5 月 21 日

Escaping Saddle Points with Compressed SGD

翻译：使用压缩的 SGD 脱落套装点

Dmitrii Avdiukhin,Grigory Yaroslavtsev

Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck in the distributed setting. Gradient compression methods can be used to alleviate this problem, and a recent line of work shows that SGD augmented with gradient compression converges to an $\varepsilon$-first-order stationary point. In this paper we extend these results to convergence to an $\varepsilon$-second-order stationary point ($\varepsilon$-SOSP), which is to the best of our knowledge the first result of this type. In addition, we show that, when the stochastic gradient is not Lipschitz, compressed SGD with RandomK compressor converges to an $\varepsilon$-SOSP with the same number of iterations as uncompressed SGD [Jin et al.,2021] (JACM), while improving the total communication by a factor of $\tilde \Theta(\sqrt{d} \varepsilon^{-3/4})$, where $d$ is the dimension of the optimization problem. We present additional results for the cases when the compressor is arbitrary and when the stochastic gradient is Lipschitz.

翻译：SGD是大规模分布式机器学习的一种普遍优化技术。 SGD 计算可以高效地在多个机器之间分配, 通信通常会成为分布式环境中的一个瓶颈。渐进压缩方法可以用来缓解这一问题, 最近的一项工作显示, 梯度压缩后, SGD 的放大与 $\ varepsilon$- 一级固定点相匹配。在本文件中, 我们将这些结果扩展为 $\ varepsilon$- 二级固定点( varepsilon$- SOSSP ), 而根据我们所知, 这是这种类型的第一个结果。此外, 我们显示, 当蒸汽梯度梯度梯度不是 Lipschitz 时, 用随机卡压缩的SGDGD 与 $\ varepsilon$- SOSP 相匹配, 其迭接次数与不压式SGD[ Jin et al. 2021 (JACM ) 相同, 同时通过 $\ Theta (sqrt) $ (srtrate{qreck) 3} rompalepsalislevalislus) 问题是当前正alislevalislationalislisl) 和正。

0

相关内容

SGD

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

近期必读的 NeurIPS2020 80多篇【图机器学习】相关论文

专知会员服务

54+阅读 · 2020年11月3日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

已删除

将门创投

4+阅读 · 2018年12月10日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Toward Better Generalization Bounds with Locally Elastic Stability

Toward Better Generalization Bounds with Locally Elastic Stability

Arxiv

0+阅读 · 2021年7月13日

Sampling Multiple Edges Efficiently

Arxiv

0+阅读 · 2021年7月13日

Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

Arxiv

0+阅读 · 2021年7月12日

Continuous Time Bandits With Sampling Costs

Arxiv

0+阅读 · 2021年7月12日

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Arxiv

0+阅读 · 2021年7月12日

Better SGD using Second-order Momentum

Arxiv

0+阅读 · 2021年7月12日

Implicit Langevin Algorithms for Sampling From Log-concave Densities

Arxiv

0+阅读 · 2021年7月10日

Asymptotic Optimality of Conditioned Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年7月9日

Decomposition of flow data via gradient-based transport optimization

Decomposition of flow data via gradient-based transport optimization

Arxiv

0+阅读 · 2021年7月9日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

近期必读的 NeurIPS2020 80多篇【图机器学习】相关论文

专知会员服务

54+阅读 · 2020年11月3日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

已删除

将门创投

4+阅读 · 2018年12月10日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Toward Better Generalization Bounds with Locally Elastic Stability

Toward Better Generalization Bounds with Locally Elastic Stability

Arxiv

0+阅读 · 2021年7月13日

Sampling Multiple Edges Efficiently

Arxiv

0+阅读 · 2021年7月13日

Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

Arxiv

0+阅读 · 2021年7月12日

Continuous Time Bandits With Sampling Costs

Arxiv

0+阅读 · 2021年7月12日

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression

Arxiv

0+阅读 · 2021年7月12日

Better SGD using Second-order Momentum

Arxiv

0+阅读 · 2021年7月12日

Implicit Langevin Algorithms for Sampling From Log-concave Densities

Arxiv

0+阅读 · 2021年7月10日

Asymptotic Optimality of Conditioned Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年7月9日

Decomposition of flow data via gradient-based transport optimization

Decomposition of flow data via gradient-based transport optimization

Arxiv

0+阅读 · 2021年7月9日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员