动动方法最后的迭代趋同 (On the Last Iterate Convergence of Momentum Methods) - 专知论文

会员服务 ·

0

动量法 · 动量 · 优化器 · Lipschitz · 凸函数 ·

2021 年 2 月 13 日

On the Last Iterate Convergence of Momentum Methods

翻译：动动方法最后的迭代趋同

Xiaoyu Li,Mingrui Liu,Francesco Orabona

SGD with Momentum (SGDM) is widely used for large scale optimization of machine learning problems. Yet, the theoretical understanding of this algorithm is not complete. In fact, even the most recent results require changes to the algorithm like an averaging scheme and a projection onto a bounded domain, which are never used in practice. Also, no lower bound is known for SGDM. In this paper, we prove for the first time that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from an error $\Omega(\frac{\log T}{\sqrt{T}})$ after $T$ steps. Based on this fact, we study a new class of (both adaptive and non-adaptive) Follow-The-Regularized-Leader-based SGDM algorithms with \emph{increasing momentum} and \emph{shrinking updates}. For these algorithms, we show that the last iterate has optimal convergence $O (\frac{1}{\sqrt{T}})$ for unconstrained convex optimization problems. Further, we show that in the interpolation setting with convex and smooth functions, our new SGDM algorithm automatically converges at a rate of $O(\frac{\log T}{T})$. Empirical results are shown as well.

翻译：带有 Momentum (SGDM) 的 SGD 和 Momentum (SGDM) 广泛用于大规模优化机器学习问题。然而, 对这一算法的理论理解尚未完全完成。事实上, 甚至最近的结果都需要改变算法, 比如平均制程和投影到一个从未实际使用的封闭域。此外, SGDM 也没有已知的下限。在本文中, 我们第一次证明对于任何恒定的动因, 存在一个 Lipschitz 和 convex 函数, 而对于这种函数, SGDM 最后一个变异功能在$T$(\frac) $(Tunschqrt{T}} $($) 之后出现错误。基于这一事实, 我们研究一种新的( 适应性和非适应性的) 和不适应性的 SGDDM 算法( ), 展示了新的结果( 在 ASqlAx 中, 展示了我们的正统化结果。

0

相关内容

动量法

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

4+阅读 · 2018年6月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Convergence analysis of the time-stepping numerical methods for time-fractional nonlinear subdiffusion equations

Arxiv

0+阅读 · 2021年4月7日

Fast Convergence on Perfect Classification for Functional Data

Arxiv

0+阅读 · 2021年4月7日

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

Arxiv

0+阅读 · 2021年4月6日

Applying splitting methods with complex coefficients to the numerical integration of unitary problems

Arxiv

0+阅读 · 2021年4月6日

A Caputo fractional derivative-based algorithm for optimization

Arxiv

0+阅读 · 2021年4月6日

Optimal Query Complexity of Secure Stochastic Convex Optimization

Arxiv

0+阅读 · 2021年4月5日

Random Reshuffling: Simple Analysis with Vast Improvements

Arxiv

0+阅读 · 2021年4月5日

A remark on discretization of the uniform norm

Arxiv

0+阅读 · 2021年4月2日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

4+阅读 · 2018年6月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Convergence analysis of the time-stepping numerical methods for time-fractional nonlinear subdiffusion equations

Arxiv

0+阅读 · 2021年4月7日

Fast Convergence on Perfect Classification for Functional Data

Arxiv

0+阅读 · 2021年4月7日

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

Arxiv

0+阅读 · 2021年4月6日

Applying splitting methods with complex coefficients to the numerical integration of unitary problems

Arxiv

0+阅读 · 2021年4月6日

A Caputo fractional derivative-based algorithm for optimization

Arxiv

0+阅读 · 2021年4月6日

Optimal Query Complexity of Secure Stochastic Convex Optimization

Arxiv

0+阅读 · 2021年4月5日

Random Reshuffling: Simple Analysis with Vast Improvements

Arxiv

0+阅读 · 2021年4月5日

A remark on discretization of the uniform norm

Arxiv

0+阅读 · 2021年4月2日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员