如何尽量减少尖锐度? (How Does Sharpness-Aware Minimization Minimize Sharpness?) - 专知论文

会员服务 ·

0

正则化项 · 泛化理论 · 近似 · motivation · 确切的 ·

2023 年 1 月 5 日

How Does Sharpness-Aware Minimization Minimize Sharpness?

翻译：如何尽量减少尖锐度?

Kaiyue Wen,Tengyu Ma,Zhiyuan Li

from arxiv, 94 pages, 1 figure

Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for improving the generalization of deep neural networks for various settings. However, the underlying working of SAM remains elusive because of various intriguing approximations in the theoretical characterizations. SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees. The subtle differences in these notions of sharpness can indeed lead to significantly different empirical results. This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. The key mechanism behind this intriguing phenomenon is the alignment between the gradient and the top eigenvector of Hessian when SAM is applied.

翻译：锐锐度最小化(SAM)是一种非常有效的正规化技术,用于改进各种环境的深神经网络的普及性,但是,由于理论定性中各种令人感兴趣的近似值,SAM的基本工作仍然难以实现。SAM打算惩罚模型的锐度概念,但采用一种计算效率高的变体;此外,还使用了第三个锐度概念来证明一般化保障。这些锐度概念的细微差异确实可能导致显著不同的经验性结果。本文严格地将SAM规范并澄清基本机制的精确锐度概念固定下来。我们还表明,SAM最初动机的两个近似步骤导致不准确的本地结论,但两者的结合无意地揭示了在应用全称梯度时的正确效果。此外,我们还证明SAM事实上的奇异性版本对上面提到的第三个锐度概念进行了规范,这很可能是实际表现的首选概念。这一诱因现象背后的关键机制是梯度和赫斯安亚的顶部在应用SAM时的梯度和顶端导师之间的对齐。

0

相关内容

正则化项

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于电子衍射和拉曼光谱的YBaCo4O7+δ的超结构及其相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

ErbB4通路激活介导非小细胞肺癌EGFR-TKIs获得性耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

金属晶粒长大动力学的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

新型褪黑素受体激动剂Neu-P11改善胰岛素敏感性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

高导热三维石墨烯作为相变储能材料载体的协同传热储热机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

几个非线性Schrodinger方程组模型及相关问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

隧道砂卵石围岩离散元法细观参数计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

近红外光谱分析模型与传染病模型

国家自然科学基金

1+阅读 · 2011年12月31日

高光度blazar的甚高能伽马射线辐射研究

国家自然科学基金

0+阅读 · 2009年12月31日

Multi-View Independent Component Analysis with Shared and Individual Sources

Arxiv

0+阅读 · 2023年3月3日

Time-fractional porous medium equation: Erdélyi-Kober integral equations, compactly supported solutions, and numerical methods

Arxiv

0+阅读 · 2023年3月3日

Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning

Arxiv

0+阅读 · 2023年3月3日

Data-efficient, Explainable and Safe Payload Manipulation: An Illustration of the Advantages of Physical Priors in Model-Predictive Control

Arxiv

0+阅读 · 2023年3月2日

Improving Safety in Mixed Traffic: A Learning-based Model Predictive Control for Autonomous and Human-Driven Vehicle Platooning

Arxiv

0+阅读 · 2023年3月2日

Weighted Maximum Likelihood for Controller Tuning

Arxiv

0+阅读 · 2023年3月2日

Sharpness-Aware Training for Free

Arxiv

0+阅读 · 2023年3月2日

Practical Network Acceleration with Tiny Sets: Hypothesis, Theory, and Algorithm

Arxiv

0+阅读 · 2023年3月2日

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

Arxiv

0+阅读 · 2023年3月1日

Algorithmic Solutions for Maximizing Shareable Costs

Arxiv

0+阅读 · 2023年2月28日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Multi-View Independent Component Analysis with Shared and Individual Sources

Arxiv

0+阅读 · 2023年3月3日

Time-fractional porous medium equation: Erdélyi-Kober integral equations, compactly supported solutions, and numerical methods

Arxiv

0+阅读 · 2023年3月3日

Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning

Arxiv

0+阅读 · 2023年3月3日

Data-efficient, Explainable and Safe Payload Manipulation: An Illustration of the Advantages of Physical Priors in Model-Predictive Control

Arxiv

0+阅读 · 2023年3月2日

Improving Safety in Mixed Traffic: A Learning-based Model Predictive Control for Autonomous and Human-Driven Vehicle Platooning

Arxiv

0+阅读 · 2023年3月2日

Weighted Maximum Likelihood for Controller Tuning

Arxiv

0+阅读 · 2023年3月2日

Sharpness-Aware Training for Free

Arxiv

0+阅读 · 2023年3月2日

Practical Network Acceleration with Tiny Sets: Hypothesis, Theory, and Algorithm

Arxiv

0+阅读 · 2023年3月2日

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

Arxiv

0+阅读 · 2023年3月1日

Algorithmic Solutions for Maximizing Shareable Costs

Arxiv

0+阅读 · 2023年2月28日

相关基金

基于电子衍射和拉曼光谱的YBaCo4O7+δ的超结构及其相变研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

ErbB4通路激活介导非小细胞肺癌EGFR-TKIs获得性耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

金属晶粒长大动力学的多尺度模拟

国家自然科学基金

0+阅读 · 2012年12月31日

新型褪黑素受体激动剂Neu-P11改善胰岛素敏感性的研究

国家自然科学基金

0+阅读 · 2012年12月31日

高导热三维石墨烯作为相变储能材料载体的协同传热储热机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

几个非线性Schrodinger方程组模型及相关问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

隧道砂卵石围岩离散元法细观参数计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

近红外光谱分析模型与传染病模型

国家自然科学基金

1+阅读 · 2011年12月31日

高光度blazar的甚高能伽马射线辐射研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员