Sharpness-Aware Minimization动态：在山谷中反弹、漂移到宽谷底处 (The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima) - 专知论文

会员服务 ·

0

梯度 · 梯度优化 · 随机初始化 · 收敛速度 · 步长 ·

2023 年 4 月 11 日

The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

翻译：Sharpness-Aware Minimization动态：在山谷中反弹、漂移到宽谷底处

Peter L. Bartlett,Philip M. Long,Olivier Bousquet

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative -- the derivative of the Hessian in the leading eigenvector direction -- that encourages drift toward wider minima.

翻译：我们考虑一种称为SAM的梯度优化方法，它对于深度网络在图像和语言预测问题上表现出了性能改进。我们表明，当SAM应用于凸二次目标时，对于大部分随机初始化，它会收敛到一个循环，在曲率最大的方向上震荡，并来回在最小值的两侧徘徊。我们还提供了收敛速度的界限。在非二次情况下，我们表明，这种振荡实际上是在Hessian的谱范数上执行了一个步长更小的梯度下降，并且SAM的更新可以被视为第三导数 - 在主要特征向量方向上的Hessian的一阶导数-鼓励向更宽的极小值漂移。

0

相关内容

梯度的本意是一个向量（矢量），表示某一函数在该点处的方向导数沿着该方向取得最大值，即函数在该点处沿着该方向（此梯度的方向）变化最快，变化率最大（为该梯度的模）。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【ICML2021】双加速的快速间隔最大化

专知会员服务

12+阅读 · 2021年7月4日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

《C++ Primer中文版第5版》电子书与学习笔记和课后练习答案

《C++ Primer中文版第5版》电子书与学习笔记和课后练习答案

专知会员服务

274+阅读 · 2020年2月13日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

专知

14+阅读 · 2018年6月24日

如何找到最优学习率？

如何找到最优学习率？

AI研习社

11+阅读 · 2017年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

混杂Lagrange网络系统协调动力学的分析与控制

国家自然科学基金

0+阅读 · 2012年12月31日

新变指标Besov-Triebel-Lizorkin型函数空间及算子有界性

国家自然科学基金

0+阅读 · 2012年12月31日

奇异非自伴哈密顿算子谱的研究

国家自然科学基金

0+阅读 · 2012年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

保持时空连续变化的三维纹理变形方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

西北太平洋地区对流层高层环境流场对热带气旋强度变化的影响

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

观测含时间延迟的偏微分控制系统的输出反馈镇定

国家自然科学基金

0+阅读 · 2011年12月31日

数值求解最优控制：动态规划方法

国家自然科学基金

1+阅读 · 2009年12月31日

Double-Weighting for Covariate Shift Adaptation

Arxiv

0+阅读 · 2023年5月27日

On the Importance of Backbone to the Adversarial Robustness of Object Detectors

Arxiv

0+阅读 · 2023年5月27日

High-dimensional Central Limit Theorems by Stein's Method in the Degenerate Case

Arxiv

0+阅读 · 2023年5月27日

Optimizing NOTEARS Objectives via Topological Swaps

Arxiv

0+阅读 · 2023年5月26日

Towards higher-order accurate mass lumping in explicit isogeometric analysis for structural dynamics

Arxiv

0+阅读 · 2023年5月26日

Error Bounds for Learning with Vector-Valued Random Features

Arxiv

0+阅读 · 2023年5月26日

Decision Diagram-Based Branch-and-Bound with Caching for Dominance and Suboptimality Detection

Arxiv

0+阅读 · 2023年5月26日

Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person

Arxiv

0+阅读 · 2023年5月26日

Parameter-Efficient Fine-Tuning without Introducing New Latency

Arxiv

0+阅读 · 2023年5月26日

Sharpness-Aware Minimization Leads to Low-Rank Features

Arxiv

0+阅读 · 2023年5月25日

VIP会员

文章信息

相关主题

随机初始化

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【ICML2021】双加速的快速间隔最大化

专知会员服务

12+阅读 · 2021年7月4日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

《C++ Primer中文版第5版》电子书与学习笔记和课后练习答案

《C++ Primer中文版第5版》电子书与学习笔记和课后练习答案

专知会员服务

274+阅读 · 2020年2月13日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

【论文推荐】最新六篇主题模型相关论文—动态主题模型、主题趋势、大规模并行采样、随机采样、非参主题建模

专知

14+阅读 · 2018年6月24日

如何找到最优学习率？

如何找到最优学习率？

AI研习社

11+阅读 · 2017年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Double-Weighting for Covariate Shift Adaptation

Arxiv

0+阅读 · 2023年5月27日

On the Importance of Backbone to the Adversarial Robustness of Object Detectors

Arxiv

0+阅读 · 2023年5月27日

High-dimensional Central Limit Theorems by Stein's Method in the Degenerate Case

Arxiv

0+阅读 · 2023年5月27日

Optimizing NOTEARS Objectives via Topological Swaps

Arxiv

0+阅读 · 2023年5月26日

Towards higher-order accurate mass lumping in explicit isogeometric analysis for structural dynamics

Arxiv

0+阅读 · 2023年5月26日

Error Bounds for Learning with Vector-Valued Random Features

Arxiv

0+阅读 · 2023年5月26日

Decision Diagram-Based Branch-and-Bound with Caching for Dominance and Suboptimality Detection

Arxiv

0+阅读 · 2023年5月26日

Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person

Arxiv

0+阅读 · 2023年5月26日

Parameter-Efficient Fine-Tuning without Introducing New Latency

Arxiv

0+阅读 · 2023年5月26日

Sharpness-Aware Minimization Leads to Low-Rank Features

Arxiv

0+阅读 · 2023年5月25日

相关基金

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

混杂Lagrange网络系统协调动力学的分析与控制

国家自然科学基金

0+阅读 · 2012年12月31日

新变指标Besov-Triebel-Lizorkin型函数空间及算子有界性

国家自然科学基金

0+阅读 · 2012年12月31日

奇异非自伴哈密顿算子谱的研究

国家自然科学基金

0+阅读 · 2012年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

保持时空连续变化的三维纹理变形方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

西北太平洋地区对流层高层环境流场对热带气旋强度变化的影响

国家自然科学基金

0+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

观测含时间延迟的偏微分控制系统的输出反馈镇定

国家自然科学基金

0+阅读 · 2011年12月31日

数值求解最优控制：动态规划方法

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员