托盘梯级裂缝的稳定性和汇合性:在利普施基茨之后的连续性和平稳性 (Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness) - 专知论文

会员服务 ·

0

梯度截断 · SGD · Lipschitz连续 · Continuity · Lipschitz ·

2021 年 6 月 10 日

Stability and Convergence of Stochastic Gradient Clipping: Beyond Lipschitz Continuity and Smoothness

翻译：托盘梯级裂缝的稳定性和汇合性:在利普施基茨之后的连续性和平稳性

Vien V. Mai,Mikael Johansson

from arxiv, ICML-2021

Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clipping is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clipping heuristic are poorly understood, especially for stochastic problems. This paper establishes both qualitative and quantitative convergence results of the clipped stochastic (sub)gradient method (SGD) for non-smooth convex functions with rapidly growing subgradients. Our analyses show that clipping enhances the stability of SGD and that the clipped SGD algorithm enjoys finite convergence rates in many cases. We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case, for weakly convex problems under standard assumptions. With a novel Lyapunov analysis, we show that the proposed method achieves the best-known rate for the considered class of problems, demonstrating the effectiveness of clipped methods also in this regime. Numerical results confirm our theoretical developments.

翻译：当应用到没有Lipschitz持续和(或)捆绑梯度的函数时,沙变梯度算法往往不稳定。渐变剪切是一种简单而有效的技术,可以稳定容易引发梯度爆炸的问题的培训过程。尽管其广受欢迎,但梯变剪切脂质的趋同特性不易理解,特别是对于沙变问题。本文件确定了剪切的沙变(次)梯度(次)法(SGD)在质量和数量两方面的趋同结果,用于与快速增长的亚梯度相交的非moose convex函数。我们的分析显示,剪切的SGD算法提高了SGD的稳定性,而且在许多情况下,剪切的SGD算法具有有限的趋同率。我们还研究了剪切方法与势头的趋同性,其中包括剪切的SGD,作为标准假设下较弱的粘结问题的特殊案例。我们用新的Lyapunov分析显示,拟议的方法在质量和数量上达到了最为人所知的水平,显示了这个制度下剪切方法的有效性。

0

相关内容

梯度截断

截断，即通过某个阈值来控制系数的大小，若系数小于某个阈值便将该系数设置为0，即简单截断。

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

64+阅读 · 2020年12月11日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【Facebook AI】低资源机器翻译，74页ppt

【Facebook AI】低资源机器翻译，74页ppt

专知会员服务

30+阅读 · 2020年4月8日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

论文 | CIKM2017 最佳论文鉴赏

论文 | CIKM2017 最佳论文鉴赏

机器学习研究会

4+阅读 · 2017年12月19日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Finite Element Approximation of Steady Flows of Colloidal Solutions

Arxiv

0+阅读 · 2021年8月5日

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

Arxiv

0+阅读 · 2021年8月5日

Improving the Power of Economic Experiments Using Adaptive Designs

Improving the Power of Economic Experiments Using Adaptive Designs

Arxiv

0+阅读 · 2021年8月5日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Lipschitz Lifelong Reinforcement Learning

Arxiv

4+阅读 · 2020年1月17日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

Lipschitz连续

相关VIP内容

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

64+阅读 · 2020年12月11日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

【Facebook AI】低资源机器翻译，74页ppt

【Facebook AI】低资源机器翻译，74页ppt

专知会员服务

30+阅读 · 2020年4月8日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

281页pdf《神经网络设计入门》

【普林斯顿博士论文】以奖励推动生成式人工智能的发展：奖励引导生成的理论与方法

中文版 | 火力支援与巡飞弹药的未来（附原文）

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

《模式识别与机器学习(PRML)》正式开放免费下载

《模式识别与机器学习(PRML)》正式开放免费下载

AINLP

27+阅读 · 2018年11月27日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

论文 | CIKM2017 最佳论文鉴赏

论文 | CIKM2017 最佳论文鉴赏

机器学习研究会

4+阅读 · 2017年12月19日

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

NIPS 2017：贝叶斯深度学习与深度贝叶斯学习（讲义+视频）

机器学习研究会

36+阅读 · 2017年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Finite Element Approximation of Steady Flows of Colloidal Solutions

Arxiv

0+阅读 · 2021年8月5日

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

Arxiv

0+阅读 · 2021年8月5日

Improving the Power of Economic Experiments Using Adaptive Designs

Improving the Power of Economic Experiments Using Adaptive Designs

Arxiv

0+阅读 · 2021年8月5日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Lipschitz Lifelong Reinforcement Learning

Arxiv

4+阅读 · 2020年1月17日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

Asynchronous Byzantine Machine Learning (the case of SGD)

Arxiv

3+阅读 · 2018年7月9日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员