恢复了对存储器的湿度 (Stochastic Weight Averaging Revisited) - 专知论文

会员服务 ·

0

Weight · SGD · Backbone · 泛化理论 · 统计量 ·

2022 年 8 月 22 日

Stochastic Weight Averaging Revisited

翻译：恢复了对存储器的湿度

Hao Guo,Jiyong Jin,Bin Liu

Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is a simple yet effective approach to assist the backbone SGD in finding better optima, in terms of generalization. From a statistical perspective, weight averaging (WA) contributes to variance reduction. Recently, a well-established stochastic weight averaging (SWA) method is proposed, which is featured by the application of a cyclical or high constant (CHC) learning rate schedule (LRS) in the process of generating weight samples for the WA operation. Then a new insight on WA appears, which states that WA helps to discover wider optima and then leads to better generalization. We conduct extensive experimental studies for SWA, involving a dozen modern DNN model structures and a dozen benchmark open-source image, graph, and text datasets. We disentangle contributions of the WA operation and the CHC LRS for SWA, showing that the WA operation in SWA still contributes to variance reduction but does not always lead to wide optima. We show how the statistical and geometric views on SWA reconcile. Based on our experimental findings, we raise a hypothesis that there are global scale geometric structures in the DNN loss landscape that can be discovered by an SGD agent at the early stage of its working period, and such global geometric structures can be exploited by the WA operation. This hypothesis inspires an algorithm design termed periodic SWA (PSWA). We find that PSWA outperforms its backbone SGD remarkably during the early stage of the SGD sampling process, and thus demonstrate that our hypothesis holds. Codes for reproducing the experimental results can be found at https://github.com/ZJLAB-AMMI/PSWA.

翻译：由主干网梯度梯度下降(SGD)取样的神经网速权重是协助主干SGD在一般化方面找到更好选择的简单而有效的方法,从统计角度看,平均权重(WA)有助于减少差异。最近,提出了一种成熟的神经网量平均(SWA)法,其特点是采用周期性或高常数(CHC)学习率表(LRS),为WA行动生成重量样本。然后出现了对WA的新认识,指出WA有助于发现更广泛的Opima,然后导致更好的普遍化。我们为SWA进行了广泛的实验性研究,涉及十多个现代DNN模型结构以及十多个基准开放源图像、图表和文本数据集。我们分化了WA行动和CHC学习率表(CHC)的贡献,表明SWA的行动仍然有助于减少差异,但并不总是导致广泛的选择。我们展示了SWA的统计和几何观点如何在SBA的轨道模型阶段调和SBAFAS的模型模型设计阶段,因此,我们在SBA的早期的SBA值结构结构结构结构中可以得出一种假设,我们在SBAFA的早期的SBA值结构中可以发现,我们在SBA的SBA值结构结构结构上发现,在SBA的早期的测值值值值值值值值值值值值值值值值结构的模型的模型的模型的模型的模型可以进行一个假设。

0

相关内容

Weight

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

热-机械疲劳载荷下抗高温材料表面冷却孔的变形研究

国家自然科学基金

0+阅读 · 2013年12月31日

多因素耦合的大型乙烯裂解炉管损伤机理与损伤发展

国家自然科学基金

0+阅读 · 2013年12月31日

IRF家族乙酰化修饰在心脏重构中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

microRNA-29b介导血管平滑肌细胞AT1aR基因DNA去甲基化参与高血压发病机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

HDAC6介导的乙酰化表观遗传修饰在PCOS胰岛素抵抗中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

大鼠非编码RNA（microRNA）介导neuritin基因表达调控的研究

国家自然科学基金

0+阅读 · 2009年12月31日

一类necroptosis诱导剂抗肿瘤干细胞的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Accelerated Single-Call Methods for Constrained Min-Max Optimization

Arxiv

0+阅读 · 2022年10月6日

An Efficient Contact Algorithm for Rigid/Deformable Interaction based on the Dual Mortar Method

Arxiv

0+阅读 · 2022年10月6日

Composite Likelihoods with Bounded Weights in Extrapolation of Data

Arxiv

0+阅读 · 2022年10月4日

ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training

Arxiv

0+阅读 · 2022年10月4日

Stability Analysis and Generalization Bounds of Adversarial Training

Arxiv

0+阅读 · 2022年10月3日

ZAP: $Z$-value Adaptive Procedures for False Discovery Rate Control with Side Information

Arxiv

0+阅读 · 2022年10月2日

Revisiting Classical Multiclass Linear Discriminant Analysis with a Novel Prototype-based Interpretable Solution

Arxiv

0+阅读 · 2022年9月30日

Risk Control for Online Learning Models

Arxiv

0+阅读 · 2022年9月30日

Ensemble-based gradient inference for particle methods in optimization and sampling

Arxiv

0+阅读 · 2022年9月23日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Accelerated Single-Call Methods for Constrained Min-Max Optimization

Arxiv

0+阅读 · 2022年10月6日

An Efficient Contact Algorithm for Rigid/Deformable Interaction based on the Dual Mortar Method

Arxiv

0+阅读 · 2022年10月6日

Composite Likelihoods with Bounded Weights in Extrapolation of Data

Arxiv

0+阅读 · 2022年10月4日

ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training

Arxiv

0+阅读 · 2022年10月4日

Stability Analysis and Generalization Bounds of Adversarial Training

Arxiv

0+阅读 · 2022年10月3日

ZAP: $Z$-value Adaptive Procedures for False Discovery Rate Control with Side Information

Arxiv

0+阅读 · 2022年10月2日

Revisiting Classical Multiclass Linear Discriminant Analysis with a Novel Prototype-based Interpretable Solution

Arxiv

0+阅读 · 2022年9月30日

Risk Control for Online Learning Models

Arxiv

0+阅读 · 2022年9月30日

Ensemble-based gradient inference for particle methods in optimization and sampling

Arxiv

0+阅读 · 2022年9月23日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

相关基金

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

热-机械疲劳载荷下抗高温材料表面冷却孔的变形研究

国家自然科学基金

0+阅读 · 2013年12月31日

多因素耦合的大型乙烯裂解炉管损伤机理与损伤发展

国家自然科学基金

0+阅读 · 2013年12月31日

IRF家族乙酰化修饰在心脏重构中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

microRNA-29b介导血管平滑肌细胞AT1aR基因DNA去甲基化参与高血压发病机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

HDAC6介导的乙酰化表观遗传修饰在PCOS胰岛素抵抗中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

大鼠非编码RNA（microRNA）介导neuritin基因表达调控的研究

国家自然科学基金

0+阅读 · 2009年12月31日

一类necroptosis诱导剂抗肿瘤干细胞的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员