尖锐度最小化知识分析 (Stability Analysis of Sharpness-Aware Minimization) - 专知论文

会员服务 ·

0

鞍点 · Analysis · 随机动力系统 · 损失 · 平坦最小值 ·

2023 年 1 月 16 日

Stability Analysis of Sharpness-Aware Minimization

翻译：尖锐度最小化知识分析

Hoki Kim,Jinseong Park,Yujin Choi,Jaewook Lee

Sharpness-aware minimization (SAM) is a recently proposed training method that seeks to find flat minima in deep learning, resulting in state-of-the-art performance across various domains. Instead of minimizing the loss of the current weights, SAM minimizes the worst-case loss in its neighborhood in the parameter space. In this paper, we demonstrate that SAM dynamics can have convergence instability that occurs near a saddle point. Utilizing the qualitative theory of dynamical systems, we explain how SAM becomes stuck in the saddle point and then theoretically prove that the saddle point can become an attractor under SAM dynamics. Additionally, we show that this convergence instability can also occur in stochastic dynamical systems by establishing the diffusion of SAM. We prove that SAM diffusion is worse than that of vanilla gradient descent in terms of saddle point escape. Further, we demonstrate that often overlooked training tricks, momentum and batch-size, are important to mitigate the convergence instability and achieve high generalization performance. Our theoretical and empirical results are thoroughly verified through experiments on several well-known optimization problems and benchmark tasks.

翻译：锐化意识最小化(SAM)是最近提出的一种培训方法,旨在寻找深层学习中平坦的迷你,从而在各个领域取得最先进的业绩。SAM不是将现有重量的损失降到最低,而是将参数空间附近最坏的情况损失降到最低。在本文中,我们证明SAM的动态动态可能会在马鞍点附近出现趋同性不稳定。我们利用动态系统的质量理论,解释SAM如何被困在马鞍点上,然后从理论上证明,在SAM的动态下,马鞍点可以成为吸引器。此外,我们表明,通过建立SAM的传播,这种趋同性动态系统也可能出现这种趋同性不稳定。我们证明,SAM的传播比马鞍点脱落时的香草梯度梯落差。此外,我们证明,常常忽视训练技巧、动力和批量大小,对于减轻趋同性不稳定和取得高一般化业绩非常重要。我们理论和经验的结果通过对几个众所周知的优化问题和基准任务进行实验得到彻底核实。

0

相关内容

在数学中，鞍点或极大极小点是函数图形表面上的一点，其正交方向上的斜率(导数)都为零，但它不是函数的局部极值。鞍点是在某一轴向(峰值之间)有一个相对最小的临界点，在交叉轴上有一个相对最大的临界点。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

GPC3调控肝细胞癌侵袭和转移的信号通路研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

飞秒激光湿法刻蚀微纳制造的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

计及非局部阻尼效应的复合材料结构多尺度分析及动特性一体化优化设计

国家自然科学基金

0+阅读 · 2012年12月31日

基于振动响应二次协方差矩阵的服役桥梁环境振动健康监测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

城镇化快速进程中城市公共交通优先发展政策研究

国家自然科学基金

1+阅读 · 2012年2月29日

《计算机研究与发展》学术期刊

国家自然科学基金

1+阅读 · 2011年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

An analytic theory for the dynamics of wide quantum neural networks

Arxiv

0+阅读 · 2023年3月10日

Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes

Arxiv

0+阅读 · 2023年3月10日

Mixed-effects location-scale model based on generalized hyperbolic distribution

Arxiv

0+阅读 · 2023年3月10日

Variational Quantum Neural Networks (VQNNS) in Image Classification

Arxiv

0+阅读 · 2023年3月10日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Teleoperation of Soft Modular Robots: Study on Real-time Stability and Gait Control

Arxiv

0+阅读 · 2023年3月9日

Goal-oriented Policies for Cost of Actuation Error Minimization in Wireless Autonomous Systems

Arxiv

0+阅读 · 2023年3月8日

A comparison of rational and neural network based approximations

Arxiv

0+阅读 · 2023年3月8日

Exploring Adversarial Attacks on Neural Networks: An Explainable Approach

Arxiv

0+阅读 · 2023年3月8日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

随机动力系统

平坦最小值

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

专知

14+阅读 · 2018年3月30日

相关论文

An analytic theory for the dynamics of wide quantum neural networks

Arxiv

0+阅读 · 2023年3月10日

Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes

Arxiv

0+阅读 · 2023年3月10日

Mixed-effects location-scale model based on generalized hyperbolic distribution

Arxiv

0+阅读 · 2023年3月10日

Variational Quantum Neural Networks (VQNNS) in Image Classification

Arxiv

0+阅读 · 2023年3月10日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Teleoperation of Soft Modular Robots: Study on Real-time Stability and Gait Control

Arxiv

0+阅读 · 2023年3月9日

Goal-oriented Policies for Cost of Actuation Error Minimization in Wireless Autonomous Systems

Arxiv

0+阅读 · 2023年3月8日

A comparison of rational and neural network based approximations

Arxiv

0+阅读 · 2023年3月8日

Exploring Adversarial Attacks on Neural Networks: An Explainable Approach

Arxiv

0+阅读 · 2023年3月8日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

重离子储存环CSRe上激光冷却相对论能量类锂12C3+离子束的实验研究

国家自然科学基金

0+阅读 · 2015年12月31日

GPC3调控肝细胞癌侵袭和转移的信号通路研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

飞秒激光湿法刻蚀微纳制造的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

计及非局部阻尼效应的复合材料结构多尺度分析及动特性一体化优化设计

国家自然科学基金

0+阅读 · 2012年12月31日

基于振动响应二次协方差矩阵的服役桥梁环境振动健康监测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

城镇化快速进程中城市公共交通优先发展政策研究

国家自然科学基金

1+阅读 · 2012年2月29日

《计算机研究与发展》学术期刊

国家自然科学基金

1+阅读 · 2011年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员