现代神经网络的平面优化问题 (Questions for Flat-Minima Optimization of Modern Neural Networks) - 专知论文

会员服务 ·

0

平坦最小值 · 优化器 · 极小值 · Neural Networks · 图 ·

2022 年 2 月 1 日

Questions for Flat-Minima Optimization of Modern Neural Networks

翻译：现代神经网络的平面优化问题

Jean Kaddour,Linqing Liu,Ricardo Silva,Matt J. Kusner

For training neural networks, flat-minima optimizers that seek to find parameters in neighborhoods having uniformly low loss (flat minima) have been shown to improve upon stochastic and adaptive gradient-based methods. Two methods for finding flat minima stand out: 1. Averaging methods (i.e., Stochastic Weight Averaging, SWA), and 2. Minimax methods (i.e., Sharpness Aware Minimization, SAM). However, despite similar motivations, there has been limited investigation into their properties and no comprehensive comparison between them. In this work, we investigate the loss surfaces from a systematic benchmarking of these approaches across computer vision, natural language processing, and graph learning tasks. This leads us to a hypothesis: since both approaches find flat solutions in orthogonal ways, combining them should improve generalization even further. We verify this improves over either flat-minima approach in 39 out of 42 cases. When it does not, we provide potential explanations. We hope our results across image, graph, and text data will help researchers to improve deep learning optimizers, and practitioners to pinpoint the optimizer for the problem at hand.

翻译：对于培训神经网络而言,试图在平均低损失的街区找到参数的平板微粒优化器(微缩微粒)已经显示在随机和适应性梯度方法上有所改进。发现平板微粒的两种方法非常突出:1. 垂直方法(即斯托切斯微弱变异,SWA)和2. 微米方法(即敏锐认识最小化,SAM),然而,尽管动机相似,但对其特性的调查有限,彼此之间没有全面比较。在这项工作中,我们调查这些方法在计算机视觉、自然语言处理和图解学习任务方面的系统基准所造成的损失表面。这使我们得出一个假设:两种方法在或图解方法中都找到平坦的解决方案,将它们合并起来,甚至可以进一步改进一般化。我们核实在42个案例中,在平板微粒方法中,有39个案例(即敏锐意识最小化,SAM)改进了这一方法。当没有这样做时,我们提供潜在的解释。我们希望我们的图像、图表和文本数据能够帮助研究人员改进深度学习的优化,以及从业人员对问题进行精确的优化。

0

相关内容

平坦最小值

平坦最小值

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

大跨屋盖围护结构风荷载极值概率模型的优化与对比研究

国家自然科学基金

0+阅读 · 2015年12月31日

网络安全威胁踪源分析方法研究

国家自然科学基金

19+阅读 · 2015年12月31日

水流激励下转轮叶片的振动与裂纹耦合机理及安全评估方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有不完全基数语义的语言偏好多准则分析技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

类年龄结构与免疫-传染病耦合系统建模与研究

国家自然科学基金

1+阅读 · 2012年12月31日

应急物流中的车辆路径优化问题

国家自然科学基金

0+阅读 · 2012年12月31日

基于车辆自组网协同感知的连环碰撞预警模型与算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

云服务环境下服务选择与组合优化方法

国家自然科学基金

0+阅读 · 2011年12月31日

复杂场景光流场计算的鲁棒性和病态问题分析

国家自然科学基金

0+阅读 · 2009年12月31日

基于多层次CPFR的三级库存协调与优化方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

Towards Robust Neural Networks via Orthogonal Diversity

Towards Robust Neural Networks via Orthogonal Diversity

Arxiv

0+阅读 · 2022年4月18日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2022年4月15日

Transferability Properties of Graph Neural Networks

Arxiv

0+阅读 · 2022年4月14日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

The Modern Mathematics of Deep Learning

Arxiv

49+阅读 · 2021年5月9日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

平坦最小值

Neural Networks

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Towards Robust Neural Networks via Orthogonal Diversity

Towards Robust Neural Networks via Orthogonal Diversity

Arxiv

0+阅读 · 2022年4月18日

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning

Arxiv

0+阅读 · 2022年4月15日

Transferability Properties of Graph Neural Networks

Arxiv

0+阅读 · 2022年4月14日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

The Modern Mathematics of Deep Learning

Arxiv

49+阅读 · 2021年5月9日

Dynamic Neural Networks: A Survey

Arxiv

37+阅读 · 2021年2月10日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

相关基金

大跨屋盖围护结构风荷载极值概率模型的优化与对比研究

国家自然科学基金

0+阅读 · 2015年12月31日

网络安全威胁踪源分析方法研究

国家自然科学基金

19+阅读 · 2015年12月31日

水流激励下转轮叶片的振动与裂纹耦合机理及安全评估方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

具有不完全基数语义的语言偏好多准则分析技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

类年龄结构与免疫-传染病耦合系统建模与研究

国家自然科学基金

1+阅读 · 2012年12月31日

应急物流中的车辆路径优化问题

国家自然科学基金

0+阅读 · 2012年12月31日

基于车辆自组网协同感知的连环碰撞预警模型与算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

云服务环境下服务选择与组合优化方法

国家自然科学基金

0+阅读 · 2011年12月31日

复杂场景光流场计算的鲁棒性和病态问题分析

国家自然科学基金

0+阅读 · 2009年12月31日

基于多层次CPFR的三级库存协调与优化方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员