SUPER-ADAM: 更快和普遍适应性梯度框架 (SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients) - 专知论文

会员服务 ·

0

非凸 · 可约的 · 最优化 · 学成 · Performer ·

2021 年 6 月 30 日

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

翻译：SUPER-ADAM: 更快和普遍适应性梯度框架

Feihu Huang,Junyi Li,Heng Huang

from arxiv, 18 pages, 5 figures. We add the detailed proofs and correct some typos

Adaptive gradient methods have shown excellent performance for solving many machine learning problems. Although multiple adaptive methods were recently studied, they mainly focus on either empirical or theoretical aspects and also only work for specific problems by using specific adaptive learning rates. It is desired to design a universal framework for practical algorithms of adaptive gradients with theoretical guarantee to solve general problems. To fill this gap, we propose a faster and universal framework of adaptive gradients (i.e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms. Moreover, our framework can flexibly integrates the momentum and variance reduced techniques. In particular, our novel framework provides the convergence analysis support for adaptive gradient methods under the nonconvex setting. In theoretical analysis, we prove that our new algorithm can achieve the best known complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of nonconvex optimization, which matches the lower bound for stochastic smooth nonconvex optimization. In numerical experiments, we employ various deep learning tasks to validate that our algorithm consistently outperforms the existing adaptive algorithms.

翻译：适应性梯度方法在解决许多机器学习问题方面表现良好。尽管最近研究过多种适应性方法,但它们主要侧重于经验或理论方面,并且仅通过使用特定的适应性学习率来应对具体问题。期望设计一个通用的适应性梯度实际算法框架, 并有理论保证解决一般问题。为了填补这一空白, 我们建议一个快速和通用的适应性梯度框架( 即SUPER- ADAM), 引入一个包含大多数现有适应性梯度形式的通用适应性矩阵。此外, 我们的框架可以灵活地整合动力和差异减少的技术。特别是, 我们的新框架为非convex 设置下的适应性梯度方法提供了趋同性分析支持。在理论分析中, 我们证明我们的新算法可以达到已知的最复杂的 $\ tilde{O} (\ exsilon ⁇ -3}) $, 用于寻找一个与现有适应性平坦x优化的低约束点, 。在数字实验中, 我们运用了各种深层次的学习任务来验证我们的算法是否始终高于现有的适应性。

0

相关内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

如何构建你的推荐系统？这份21页ppt教程为你讲解

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

最新《几何深度学习》教程，100页ppt，Geometric Deep Learning

最新《几何深度学习》教程，100页ppt，Geometric Deep Learning

专知会员服务

104+阅读 · 2020年7月16日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】SLAM相关资源大列表

【推荐】SLAM相关资源大列表

机器学习研究会

10+阅读 · 2017年8月18日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Universal and Tight Online Algorithms for Generalized-Mean Welfare

Arxiv

0+阅读 · 2021年9月2日

Successive-Cancellation Decoding of Reed-Muller Codes with Fast Hadamard Transform

Arxiv

0+阅读 · 2021年8月31日

Spectral and Energy Efficiency of ACO-OFDM in Visible Light Communication Systems

Spectral and Energy Efficiency of ACO-OFDM in Visible Light Communication Systems

Arxiv

0+阅读 · 2021年8月31日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

Arxiv

4+阅读 · 2021年3月29日

Adaptive Universal Generalized PageRank Graph Neural Network

Arxiv

3+阅读 · 2020年10月2日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Universal Invariant and Equivariant Graph Neural Networks

Arxiv

5+阅读 · 2019年5月13日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

VIP会员

文章信息

相关主题

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

如何构建你的推荐系统？这份21页ppt教程为你讲解

如何构建你的推荐系统？这份21页ppt教程为你讲解

专知会员服务

65+阅读 · 2021年2月12日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

最新《几何深度学习》教程，100页ppt，Geometric Deep Learning

最新《几何深度学习》教程，100页ppt，Geometric Deep Learning

专知会员服务

104+阅读 · 2020年7月16日

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

【NYU CS-GY 9223I】算法机器学习和数据科学（Algorithmic Machine Learning and Data Science），纽约大学坦顿工程学院计算机科学与工程助理教授 |Christopher Musco

专知会员服务

20+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】SLAM相关资源大列表

【推荐】SLAM相关资源大列表

机器学习研究会

10+阅读 · 2017年8月18日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Universal and Tight Online Algorithms for Generalized-Mean Welfare

Arxiv

0+阅读 · 2021年9月2日

Successive-Cancellation Decoding of Reed-Muller Codes with Fast Hadamard Transform

Arxiv

0+阅读 · 2021年8月31日

Spectral and Energy Efficiency of ACO-OFDM in Visible Light Communication Systems

Spectral and Energy Efficiency of ACO-OFDM in Visible Light Communication Systems

Arxiv

0+阅读 · 2021年8月31日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

Arxiv

4+阅读 · 2021年3月29日

Adaptive Universal Generalized PageRank Graph Neural Network

Arxiv

3+阅读 · 2020年10月2日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Universal Invariant and Equivariant Graph Neural Networks

Arxiv

5+阅读 · 2019年5月13日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

微信扫码咨询专知VIP会员