大型批量优化器优化度现实检查: 传统的通用优化器, 横跨批量大小 (A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes) - 专知论文

会员服务 ·

0

优化器 · Adam · Nesterov动量法 · Neural Networks · 动量 ·

2021 年 2 月 12 日

A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

翻译：大型批量优化器优化度现实检查: 传统的通用优化器, 横跨批量大小

Zachary Nado,Justin M. Gilmer,Christopher J. Shallue,Rohan Anil,George E. Dahl

Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether LARS and LAMB have any benefit over traditional, generic algorithms. In this work we demonstrate that standard optimization algorithms such as Nesterov momentum and Adam can match or exceed the results of LARS and LAMB at large batch sizes. Our results establish new, stronger baselines for future comparisons at these batch sizes and shed light on the difficulties of comparing optimizers for neural network training more generally.

翻译：最近,LARS和LAMB的优化软件被提议用于使用大批量尺寸更快地培训神经网络。LAMB和LAMB分别为重球动力和亚当的最新规则增添了分层正常化,并成为著名基准和深层学习图书馆的流行对象。然而,如果不与标准优化软件进行公平比较,LAMB和LAMB是否对传统的通用算法有任何好处仍是一个未决问题。在这项工作中,我们证明Nesterov动力和Adam等标准优化算法可以匹配或超过LARS和LAMB的大批量尺寸结果。我们的结果为今后在这类批量尺寸上进行比较建立了新的、更强大的基线,并揭示了比较神经网络培训优化软件的困难。

0

相关内容

优化器

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

197+阅读 · 2019年12月19日

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

专知会员服务

35+阅读 · 2019年12月1日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

Github 项目推荐 | 用 Pytorch 实现的 Capsule Network

Github 项目推荐 | 用 Pytorch 实现的 Capsule Network

AI研习社

22+阅读 · 2018年3月7日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

机器学习研究会

5+阅读 · 2017年9月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

The Tangent Search Algorithm for Solving Optimization Problems

The Tangent Search Algorithm for Solving Optimization Problems

Arxiv

0+阅读 · 2021年4月6日

Optimal Sampling Gaps for Adaptive Submodular Maximization

Arxiv

0+阅读 · 2021年4月5日

A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear Network

Arxiv

0+阅读 · 2021年4月4日

Golden Tortoise Beetle Optimizer: A Novel Nature-Inspired Meta-heuristic Algorithm for Engineering Problems

Arxiv

0+阅读 · 2021年4月4日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

Slimmable Generative Adversarial Networks

Slimmable Generative Adversarial Networks

Arxiv

3+阅读 · 2020年12月10日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

Arxiv

3+阅读 · 2019年3月18日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

VIP会员

文章信息

相关主题

Nesterov动量法

Neural Networks

相关VIP内容

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

【斯坦福大学】TASO:基于深度学习优化的自动生成图变换（TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions），35页ppt

专知会员服务

10+阅读 · 2019年12月22日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

197+阅读 · 2019年12月19日

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

【ECML-PKDD 2019】序列和时间序列学习的有效线性模型（Effective Linear Models for Learning with Sequences and Time Series），Georgiana Ifrim

专知会员服务

35+阅读 · 2019年12月1日

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

【Amazon AWS】深度学习编译器（Deep Learning Compiler），附35页ppt

专知会员服务

43+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

Github 项目推荐 | 用 Pytorch 实现的 Capsule Network

Github 项目推荐 | 用 Pytorch 实现的 Capsule Network

AI研习社

22+阅读 · 2018年3月7日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

【推荐】神经网络调试经验汇编：神经网络不好使该咋办？

机器学习研究会

5+阅读 · 2017年9月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

The Tangent Search Algorithm for Solving Optimization Problems

The Tangent Search Algorithm for Solving Optimization Problems

Arxiv

0+阅读 · 2021年4月6日

Optimal Sampling Gaps for Adaptive Submodular Maximization

Arxiv

0+阅读 · 2021年4月5日

A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear Network

Arxiv

0+阅读 · 2021年4月4日

Golden Tortoise Beetle Optimizer: A Novel Nature-Inspired Meta-heuristic Algorithm for Engineering Problems

Arxiv

0+阅读 · 2021年4月4日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

Slimmable Generative Adversarial Networks

Slimmable Generative Adversarial Networks

Arxiv

3+阅读 · 2020年12月10日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Arxiv

3+阅读 · 2019年9月25日

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

Arxiv

3+阅读 · 2019年3月18日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

微信扫码咨询专知VIP会员