培训轨迹、微型批量损失和学习率的好奇作用 (Training trajectories, mini-batch losses and the curious role of the learning rate) - 专知论文

会员服务 ·

0

学习率 · 损失 · Learning · 泛函 · 损失函数（机器学习） ·

2023 年 2 月 1 日

Training trajectories, mini-batch losses and the curious role of the learning rate

翻译：培训轨迹、微型批量损失和学习率的好奇作用

Mark Sandler,Andrey Zhmoginov,Max Vladymyrov,Nolan Miller

from arxiv, 21 pages, 14 figures

Stochastic gradient descent plays a fundamental role in nearly all applications of deep learning. However its ability to converge to a global minimum remains shrouded in mystery. In this paper we propose to study the behavior of the loss function on fixed mini-batches along SGD trajectories. We show that the loss function on a fixed batch appears to be remarkably convex-like. In particular for ResNet the loss for any fixed mini-batch can be accurately modeled by a quadratic function and a very low loss value can be reached in just one step of gradient descent with sufficiently large learning rate. We propose a simple model that allows to analyze the relationship between the gradients of stochastic mini-batches and the full batch. Our analysis allows us to discover the equivalency between iterate aggregates and specific learning rate schedules. In particular, for Exponential Moving Average (EMA) and Stochastic Weight Averaging we show that our proposed model matches the observed training trajectories on ImageNet. Our theoretical model predicts that an even simpler averaging technique, averaging just two points a many steps apart, significantly improves accuracy compared to the baseline. We validated our findings on ImageNet and other datasets using ResNet architecture.

翻译：在几乎所有的深层学习应用中,沙粒梯度下坡都起着根本作用。但是,它能够聚集到全球最小的最低水平, 被神秘地笼罩起来。在本文中, 我们提议研究SGD轨迹沿SGD 轨迹的固定微型水桶上损失功能的行为。我们显示, 固定批次的损失功能似乎非常相似。特别是对于ResNet来说, 任何固定微型批次的损失可以通过二次函数来准确模拟, 并且非常低的损失值可以在梯度下坡的一步中达到, 且学习率足够高。我们提出了一个简单的模型, 能够分析随机微型水槽和整批量的梯度之间的关系。我们的分析让我们能够发现一个固定集和具体学习率时间表之间的等同性。特别是, 对于任何固定的微型批次量平均移动( EMA) 和托卡斯蒂克 Weight Avering, 我们所拟议的模型可以精确地模拟在图像网络上观察到的训练轨迹。我们的理论模型预测, 更简单的平均技术, 仅能平均两点, 我们的图像网基点, 大大地改进了我们的图像结构的精确度。

0

相关内容

学习率

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

TGF-beta信号通路在小细胞肺癌转移中的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高维近似因子模型框架下的多重检验及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

胶体溶液中介质分子诱导亚稳态单质纳米晶的生长与相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

介孔氧化钛表面反应与传递调控机制的介尺度分子模拟

国家自然科学基金

0+阅读 · 2012年12月31日

"β-hCG-ERK1/2-MMP-2"信号通路在卵巢癌侵袭、转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属-有机骨架化合物（MOFs）的手性后合成修饰及不对称催化研究

国家自然科学基金

0+阅读 · 2012年12月31日

TGF-β1通路调控MET在滑膜肉瘤双相分化和侵袭转移中作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

PANDER-FOXO1信号通路在非酒精性脂肪肝发生过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

A localized reduced basis approach for unfitted domain methods on parameterized geometries

Arxiv

0+阅读 · 2023年3月24日

Detection of out-of-distribution samples using binary neuron activation patterns

Arxiv

0+阅读 · 2023年3月24日

Regularization of polynomial networks for image recognition

Arxiv

0+阅读 · 2023年3月24日

Distributed Random Reshuffling over Networks

Arxiv

0+阅读 · 2023年3月23日

Evaluating the Role of Target Arguments in Rumour Stance Classification

Arxiv

0+阅读 · 2023年3月22日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

VIP会员

文章信息

相关主题

损失函数（机器学习）

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

A localized reduced basis approach for unfitted domain methods on parameterized geometries

Arxiv

0+阅读 · 2023年3月24日

Detection of out-of-distribution samples using binary neuron activation patterns

Arxiv

0+阅读 · 2023年3月24日

Regularization of polynomial networks for image recognition

Arxiv

0+阅读 · 2023年3月24日

Distributed Random Reshuffling over Networks

Arxiv

0+阅读 · 2023年3月23日

Evaluating the Role of Target Arguments in Rumour Stance Classification

Arxiv

0+阅读 · 2023年3月22日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

A Survey on Trajectory Data Management, Analytics, and Learning

A Survey on Trajectory Data Management, Analytics, and Learning

Arxiv

16+阅读 · 2020年3月25日

相关基金

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

TGF-beta信号通路在小细胞肺癌转移中的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

高维近似因子模型框架下的多重检验及其应用

国家自然科学基金

0+阅读 · 2013年12月31日

胶体溶液中介质分子诱导亚稳态单质纳米晶的生长与相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

介孔氧化钛表面反应与传递调控机制的介尺度分子模拟

国家自然科学基金

0+阅读 · 2012年12月31日

"β-hCG-ERK1/2-MMP-2"信号通路在卵巢癌侵袭、转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属-有机骨架化合物（MOFs）的手性后合成修饰及不对称催化研究

国家自然科学基金

0+阅读 · 2012年12月31日

TGF-β1通路调控MET在滑膜肉瘤双相分化和侵袭转移中作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

PANDER-FOXO1信号通路在非酒精性脂肪肝发生过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员