mSAM: 微批量预测尖锐度最小化 (mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization) - 专知论文

会员服务 ·

0

泛化理论 · 极小值 · 图片分类 · 损失函数（机器学习） · Performer ·

2023 年 2 月 19 日

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

翻译：mSAM: 微批量预测尖锐度最小化

Kayhan Behdin,Qingquan Song,Aman Gupta,Ayan Acharya,David Durfee,Borja Ocejo,Sathiya Keerthi,Rahul Mazumder

from arxiv, arXiv admin note: substantial text overlap with arXiv:2212.04343

Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. To account for this, Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima, which arguably have better generalization abilities. In this paper, we focus on a variant of SAM known as micro-batch SAM (mSAM), which, during training, averages the updates generated by adversarial perturbations across several disjoint shards (micro batches) of a mini-batch. We extend a recently developed and well-studied general framework for flatness analysis to show that distributed gradient computation for sharpness-aware minimization theoretically achieves even flatter minima. In order to support this theoretical superiority, we provide a thorough empirical evaluation on a variety of image classification and natural language processing tasks. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our practical implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework.

翻译：现代深层次学习模式被过分地分辨,不同的选择方法可能导致差异很大的概括性表现。为此,我们扩展了最近开发的和研究良好的平板分析总体框架,以显示在理论上为敏锐度最小化而分布梯度的计算方法甚至能够取得优美的微米。为了支持这种理论优势,我们对各种图像分类和自然语言处理任务进行了彻底的经验评估。我们还表明,与以前的工作相反,在不大幅提高计算成本的情况下,可以灵活和平行地执行微小杯的对称扰动。我们实际执行微小杯的MSAM使得与SAM相比,在广泛的任务中实现更优的概括性表现,进一步支持我们的理论框架。

0

相关内容

泛化理论

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

雷公藤甲素诱导急性早幼粒白血病细胞凋亡及自噬的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

amiRNA干扰NMHC II-A对PRRSV感染细胞凋亡信号传导的影响及机制

国家自然科学基金

0+阅读 · 2012年12月31日

IL-24抑制Bcl-2亚硝基化诱导黑色素瘤细胞凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

TiNi形状记忆合金表面W离子注入改性及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

阵列天线3D-SAR的DEM生成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

偏微分方程最优控制问题的预处理算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

热休克蛋白27（HSP27）通过与p53相互作用调控LPS诱导的单核细胞激活的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

序贯诱导重编程的自体多潜能干细胞分化为视网膜神经细胞

国家自然科学基金

0+阅读 · 2009年12月31日

金属离子注入改性TiNi形状记忆合金的表面特性与生物相容性

国家自然科学基金

0+阅读 · 2009年12月31日

Explicitly Minimizing the Blur Error of Variational Autoencoders

Arxiv

0+阅读 · 2023年4月12日

The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

Arxiv

0+阅读 · 2023年4月11日

Evaluation of Differentially Constrained Motion Models for Graph-Based Trajectory Prediction

Arxiv

0+阅读 · 2023年4月11日

Accelerating Evolution Through Gene Masking and Distributed Search

Arxiv

0+阅读 · 2023年4月10日

Reflected Diffusion Models

Arxiv

0+阅读 · 2023年4月10日

Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

Arxiv

0+阅读 · 2023年4月10日

Guiding Large Language Models via Directional Stimulus Prompting

Arxiv

0+阅读 · 2023年4月7日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

216+阅读 · 2023年4月7日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

49+阅读 · 2021年1月6日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Arxiv

11+阅读 · 2020年2月18日

VIP会员

文章信息

相关主题

损失函数（机器学习）

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Explicitly Minimizing the Blur Error of Variational Autoencoders

Arxiv

0+阅读 · 2023年4月12日

The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

Arxiv

0+阅读 · 2023年4月11日

Evaluation of Differentially Constrained Motion Models for Graph-Based Trajectory Prediction

Arxiv

0+阅读 · 2023年4月11日

Accelerating Evolution Through Gene Masking and Distributed Search

Arxiv

0+阅读 · 2023年4月10日

Reflected Diffusion Models

Arxiv

0+阅读 · 2023年4月10日

Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

Arxiv

0+阅读 · 2023年4月10日

Guiding Large Language Models via Directional Stimulus Prompting

Arxiv

0+阅读 · 2023年4月7日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

216+阅读 · 2023年4月7日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

49+阅读 · 2021年1月6日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Arxiv

11+阅读 · 2020年2月18日

相关基金

雷公藤甲素诱导急性早幼粒白血病细胞凋亡及自噬的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

amiRNA干扰NMHC II-A对PRRSV感染细胞凋亡信号传导的影响及机制

国家自然科学基金

0+阅读 · 2012年12月31日

IL-24抑制Bcl-2亚硝基化诱导黑色素瘤细胞凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

TiNi形状记忆合金表面W离子注入改性及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

阵列天线3D-SAR的DEM生成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Skutterudite/AgSbTe2系纳米复合热电材料研究

国家自然科学基金

0+阅读 · 2012年12月31日

偏微分方程最优控制问题的预处理算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

热休克蛋白27（HSP27）通过与p53相互作用调控LPS诱导的单核细胞激活的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

序贯诱导重编程的自体多潜能干细胞分化为视网膜神经细胞

国家自然科学基金

0+阅读 · 2009年12月31日

金属离子注入改性TiNi形状记忆合金的表面特性与生物相容性

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员