人工神经网络中斯托卡梯度下层和逆差缩缩关系 (On the Stochastic Gradient Descent and Inverse Variance-flatness Relation in Artificial Neural Networks) - 专知论文

会员服务 ·

0

SGD · 随机梯度下降 · Neural Networks · Principle · 统计量 ·

2022 年 7 月 11 日

On the Stochastic Gradient Descent and Inverse Variance-flatness Relation in Artificial Neural Networks

翻译：人工神经网络中斯托卡梯度下层和逆差缩缩关系

Xia Xiong,Yong-Cong Chen,Chunxiao Shi,Ping Ao

Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks has attracted continuing studies for the theoretical principles behind its success. A recent work uncovered a generic inverse variance-flatness (IVF) relation between the variance of neural weights and the landscape flatness of loss function near solutions under SGD [Feng & Tu, PNAS 118,0027 (2021)]. To investigate this seemly violation of statistical principle, we deploy a stochastic decomposition to analyze the dynamical properties of SGD. The method constructs the true "energy" function which can be used by Boltzmann distribution. The new energy differs from the usual cost function and explains the IVF relation under SGD. We further verify the scaling relation identified in Feng's work. Our approach may bridge the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithm to the latter.

翻译：在深层学习神经网络中广泛使用的电磁梯度下降(SGD)这一在深层神经网络中广泛使用的算法吸引了对其成功背后的理论原则的持续研究。最近的一项工作发现,神经重量差异与SGD[Feng & Tu, PNAS 118,0027(2021 ) 下解决方案附近损失功能的景观平坦性之间,存在着一种普遍的反向反向膨胀化(IVF)关系。为了调查这一似乎违反统计原则的现象,我们运用了一种随机分解法来分析SGD的动态特性。该方法构建了Boltzmann分布可以使用的真正“能源”功能。新的能源与通常的成本功能不同,并解释了SGD下的IVF关系。我们进一步核实了冯氏工作中发现的缩放关系。我们的方法可以弥合经典统计力与人造智能正在形成的学科之间的差距,从而有可能向后者提供更好的算法。

0

相关内容

SGD

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

基于优化Schwarz算法的非线性预条件问题

国家自然科学基金

0+阅读 · 2015年12月31日

矩阵低秩稀疏分解的两步凸松弛法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于meet/miss-in-the-middle思想若干密码攻击方法的研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型超长链脂肪酸延长酶抑制剂的合成与构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

一株含双降解质粒的红球菌（Rhodococcus sp.）二噁英降解机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

水溶性REF3 (RE = Y,Gd)-KF体系上转换发光纳米材料合成及其生物应用

国家自然科学基金

0+阅读 · 2012年12月31日

柱撑膨润土/纳米铁协同去除氯代有机污染物的研究

国家自然科学基金

0+阅读 · 2009年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning

Arxiv

0+阅读 · 2022年9月2日

Generalizing intrusion detection for heterogeneous networks: A stacked-unsupervised federated learning approach

Arxiv

0+阅读 · 2022年9月1日

The Neural Process Family: Survey, Applications and Perspectives

The Neural Process Family: Survey, Applications and Perspectives

Arxiv

1+阅读 · 2022年9月1日

Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning

Arxiv

0+阅读 · 2022年9月1日

On Neural Differential Equations

Arxiv

23+阅读 · 2022年2月4日

A Review of Graph Neural Networks and Their Applications in Power Systems

A Review of Graph Neural Networks and Their Applications in Power Systems

Arxiv

29+阅读 · 2021年1月25日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Arxiv

37+阅读 · 2020年10月9日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

A Comprehensive Survey on Graph Neural Networks

A Comprehensive Survey on Graph Neural Networks

Arxiv

21+阅读 · 2019年1月3日

How Powerful are Graph Neural Networks?

Arxiv

23+阅读 · 2018年10月1日

VIP会员

文章信息

相关主题

随机梯度下降

Neural Networks

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning

Arxiv

0+阅读 · 2022年9月2日

Generalizing intrusion detection for heterogeneous networks: A stacked-unsupervised federated learning approach

Arxiv

0+阅读 · 2022年9月1日

The Neural Process Family: Survey, Applications and Perspectives

The Neural Process Family: Survey, Applications and Perspectives

Arxiv

1+阅读 · 2022年9月1日

Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning

Arxiv

0+阅读 · 2022年9月1日

On Neural Differential Equations

Arxiv

23+阅读 · 2022年2月4日

A Review of Graph Neural Networks and Their Applications in Power Systems

A Review of Graph Neural Networks and Their Applications in Power Systems

Arxiv

29+阅读 · 2021年1月25日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Arxiv

37+阅读 · 2020年10月9日

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Arxiv

35+阅读 · 2020年9月3日

A Comprehensive Survey on Graph Neural Networks

A Comprehensive Survey on Graph Neural Networks

Arxiv

21+阅读 · 2019年1月3日

How Powerful are Graph Neural Networks?

Arxiv

23+阅读 · 2018年10月1日

相关基金

求解时间依赖问题的隐式时空并行 Schwarz 算法研究

国家自然科学基金

0+阅读 · 2017年12月31日

基于优化Schwarz算法的非线性预条件问题

国家自然科学基金

0+阅读 · 2015年12月31日

矩阵低秩稀疏分解的两步凸松弛法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于meet/miss-in-the-middle思想若干密码攻击方法的研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型超长链脂肪酸延长酶抑制剂的合成与构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

一株含双降解质粒的红球菌（Rhodococcus sp.）二噁英降解机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

水溶性REF3 (RE = Y,Gd)-KF体系上转换发光纳米材料合成及其生物应用

国家自然科学基金

0+阅读 · 2012年12月31日

柱撑膨润土/纳米铁协同去除氯代有机污染物的研究

国家自然科学基金

0+阅读 · 2009年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员