将神经网络中宽宽平板小型微型微型动物的结构统一起来 (Unveiling the structure of wide flat minima in neural networks) - 专知论文

会员服务 ·

0

极小值 · 平坦最小值 · Neural Networks · Networking · 非凸 ·

2021 年 7 月 2 日

Unveiling the structure of wide flat minima in neural networks

翻译：将神经网络中宽宽平板小型微型微型动物的结构统一起来

Carlo Baldassi,Clarissa Lauditi,Enrico M. Malatesta,Gabriele Perugini,Riccardo Zecchina

from arxiv, 10 pages, 4 figures

The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks which needs to be understood in depth. Such algorithms are able to fit the data almost perfectly, even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here we show that wide flat minima arise from the coalescence of minima that correspond to high-margin classifications. Despite being exponentially rare compared to zero-margin solutions, high-margin minima tend to concentrate in particular regions. These minima are in turn surrounded by other solutions of smaller and smaller margin, leading to dense regions of solutions over long distances. Our analysis also provides an alternative analytical method for estimating when flat minima appear and when algorithms begin to find solutions, as the number of model parameters varies.

翻译：深层学习的成功揭示了神经网络在整个科学中的应用潜力,并开启了根本性的理论问题;特别是,基于简单的梯度变方程式的学习算法能够找到高度非康维克斯损失功能的近最佳微量算法,这是神经网络中一个出乎意料的特征,需要深入理解。这种算法几乎能够完美地适应数据,即使是在有噪音的情况下,也具有极好的预测能力。一些实证结果表明,算法和一般化性能所实现的所谓微量的平坦性之间有着可复制的相互关系。与此同时,统计物理结果显示,在非康维克斯网络中,大量狭小的小型算法可能与数量少得多的广度微小的微量网络共存,需要深入理解。我们在这里显示,大面积的小型算法与高海拔的分类相对,尽管模型与零升的解决方案相比非常罕见,但高海拔的微型算法往往集中在特定区域。这些微型算法被其他的替代方法所包围,在更小的距离和更小的距离上,我们的分析方法开始成为了最密集的区域。

0

相关内容

极小值

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020-斯坦福Facebook-何恺明】神经网络图结构，Graph Structure of Neural Networks

【ICML2020-斯坦福Facebook-何恺明】神经网络图结构，Graph Structure of Neural Networks

专知会员服务

57+阅读 · 2020年7月14日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation

Arxiv

0+阅读 · 2021年9月2日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

20+阅读 · 2021年5月10日

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Arxiv

5+阅读 · 2021年2月21日

Interpreting and Unifying Graph Neural Networks with An Optimization Framework

Arxiv

18+阅读 · 2021年1月28日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

How Powerful are Graph Neural Networks?

Arxiv

23+阅读 · 2018年10月1日

Piecewise Flat Embedding for Image Segmentation

Arxiv

3+阅读 · 2018年5月20日

Pointer Networks

Arxiv

4+阅读 · 2017年1月2日

VIP会员

文章信息

相关主题

平坦最小值

Neural Networks

相关VIP内容

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020-斯坦福Facebook-何恺明】神经网络图结构，Graph Structure of Neural Networks

【ICML2020-斯坦福Facebook-何恺明】神经网络图结构，Graph Structure of Neural Networks

专知会员服务

57+阅读 · 2020年7月14日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

构建军事人工智能信任体系始于破除黑盒机制

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

LibRec 每周算法：Wide & Deep (by Google)

LibRec 每周算法：Wide & Deep (by Google)

LibRec智能推荐

9+阅读 · 2017年10月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation

Arxiv

0+阅读 · 2021年9月2日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

20+阅读 · 2021年5月10日

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Arxiv

5+阅读 · 2021年2月21日

Interpreting and Unifying Graph Neural Networks with An Optimization Framework

Arxiv

18+阅读 · 2021年1月28日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Subgraph Neural Networks

Arxiv

27+阅读 · 2020年6月19日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

How Powerful are Graph Neural Networks?

Arxiv

23+阅读 · 2018年10月1日

Piecewise Flat Embedding for Image Segmentation

Arxiv

3+阅读 · 2018年5月20日

Pointer Networks

Arxiv

4+阅读 · 2017年1月2日

微信扫码咨询专知VIP会员