恢复了对存储器的湿度 (Stochastic Weight Averaging Revisited) - 专知论文

会员服务 ·

0

Weight · SGD · Backbone · 泛化理论 · 统计量 ·

2022 年 5 月 17 日

Stochastic Weight Averaging Revisited

翻译：恢复了对存储器的湿度

Hao Guo,Jiyong Jin,Bin Liu

Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is a simple yet effective approach to assist the backbone SGD in finding better optima, in terms of generalization. From a statistical perspective, weight averaging (WA) contributes to variance reduction. Recently, a well-established stochastic weight averaging (SWA) method is proposed, which is featured by the application of a cyclical or high constant (CHC) learning rate schedule (LRS) in the process of generating weight samples for the WA operation. Then a new insight on WA appears, which states that WA helps to discover wider optima and then leads to better generalization. We conduct extensive experimental studies for SWA, involving a dozen modern DNN model structures and a dozen benchmark open-source image, graph, and text datasets. We disentangle contributions of the WA operation and the CHC LRS for SWA, showing that the WA operation in SWA still contributes to variance reduction but does not always lead to wide optima. We show how the statistical and geometric views on SWA reconcile. Based on our experimental findings, we raise a hypothesis that there are global scale geometric structures in the DNN loss landscape that can be discovered by an SGD agent at the early stage of its working period, and such global geometric structures can be exploited by the WA operation. This hypothesis inspires an algorithm design termed periodic SWA (PSWA). We find that PSWA outperforms its backbone SGD remarkably during the early stage of the SGD sampling process, and thus demonstrate that our hypothesis holds. Codes for reproducing the experimental results can be found at https://github.com/ZJLAB-AMMI/PSWA.

翻译：由主干网梯度梯度下降(SGD)取样的神经网速权重是协助主干SGD在一般化方面找到更好选择的简单而有效的方法,从统计角度看,平均权重(WA)有助于减少差异。最近,提出了一种成熟的神经网量平均(SWA)法,其特点是采用周期性或高常数(CHC)学习率表(LRS),为WA行动生成重量样本。然后出现了对WA的新认识,指出WA有助于发现更广泛的Opima,然后导致更好的普遍化。我们为SWA进行了广泛的实验性研究,涉及十多个现代DNN模型结构以及十多个基准开放源图像、图表和文本数据集。我们分化了WA行动和CHC学习率表(CHC)的贡献,表明SWA的行动仍然有助于减少差异,但并不总是导致广泛的选择。我们展示了SWA的统计和几何观点如何在SBA的轨道模型阶段调和SBAFAS的模型模型设计阶段,因此,我们在SBA的早期的SBA值结构结构结构结构中可以得出一种假设,我们在SBAFA的早期的SBA值结构中可以发现,我们在SBA的SBA值结构结构结构上发现,在SBA的早期的测值值值值值值值值值值值值值值值值结构的模型的模型的模型的模型的模型可以进行一个假设。

0

相关内容

Weight

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

专知会员服务

7+阅读 · 2021年11月24日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

博尔纳病毒改变大鼠海马H4K5乙酰化所致学习记忆障碍的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于IL-6介导的JAK2/STAT3/TWIST通路探讨化痰祛瘀方抑制肝癌上皮间质转化的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

温阳活血利水法对TGF-β-Smads信号通路介导糖尿病心肌重构的效应机制

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Nodal-ALK7介导的β细胞内源性调节对β细胞功能的影响及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

基于决策理论规划的应急决策研究

国家自然科学基金

6+阅读 · 2009年12月31日

妊娠期哮喘诱导子代大鼠肾上腺髓质细胞向交感神经元转变机制

国家自然科学基金

0+阅读 · 2008年12月31日

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Arxiv

0+阅读 · 2022年7月7日

A Global Stochastic Optimization Particle Filter Algorithm

Arxiv

0+阅读 · 2022年7月6日

Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling

Arxiv

0+阅读 · 2022年7月5日

A Deep Learning Approach for the solution of Probability Density Evolution of Stochastic Systems

Arxiv

0+阅读 · 2022年7月5日

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Arxiv

13+阅读 · 2022年3月29日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Arxiv

11+阅读 · 2019年9月19日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

【重磅】2022年IEEE Fellow出炉！ 310位新晋升会士！王海峰、田永鸿、汪玉、申恒涛等七十九位华人当选！

专知会员服务

7+阅读 · 2021年11月24日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Arxiv

0+阅读 · 2022年7月7日

A Global Stochastic Optimization Particle Filter Algorithm

Arxiv

0+阅读 · 2022年7月6日

Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling

Arxiv

0+阅读 · 2022年7月5日

A Deep Learning Approach for the solution of Probability Density Evolution of Stochastic Systems

Arxiv

0+阅读 · 2022年7月5日

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Arxiv

13+阅读 · 2022年3月29日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Arxiv

11+阅读 · 2019年9月19日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

博尔纳病毒改变大鼠海马H4K5乙酰化所致学习记忆障碍的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于IL-6介导的JAK2/STAT3/TWIST通路探讨化痰祛瘀方抑制肝癌上皮间质转化的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

温阳活血利水法对TGF-β-Smads信号通路介导糖尿病心肌重构的效应机制

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Nodal-ALK7介导的β细胞内源性调节对β细胞功能的影响及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

量子discord及其在量子计算中的研究

国家自然科学基金

1+阅读 · 2011年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

基于决策理论规划的应急决策研究

国家自然科学基金

6+阅读 · 2009年12月31日

妊娠期哮喘诱导子代大鼠肾上腺髓质细胞向交感神经元转变机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员