过分参数化神经网络中丧失的地貌景观的几何:对称性和误差 (Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances) - 专知论文

会员服务 ·

0

子空间 · 全局极小值 · 极小值 · 评论员 · Neural Networks ·

2021 年 5 月 25 日

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

翻译：过分参数化神经网络中丧失的地貌景观的几何:对称性和误差

Berfin Şimşek,François Ged,Arthur Jacot,Francesco Spadaro,Clément Hongler,Wulfram Gerstner,Johanni Brea

from arxiv, To appear at ICML 2021

We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L-1}^* $ reaches a zero-loss minimum at $ r_1^*! \cdots r_{L-1}^*! $ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a two-layer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another. For a network of width $m$, we identify the number $G(r,m)$ of affine subspaces containing only symmetry-induced critical points that are related to the critical points of a smaller network of width $r<r^*$. Via a combinatorial analysis, we derive closed-form formulas for $ T $ and $ G $ and show that the number of symmetry-induced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small $ h $) and vice versa in the vastly overparameterized regime ($h \gg r^*$). Our results provide new insights into the minimization of the non-convex loss function of overparameterized neural networks.

翻译：我们研究在多层神经网络中,超参数化多层神经网络的变差对称性如何产生“对称诱导”的关键点。假设一个拥有L$层最小宽度的网络,假设一个L$层最小宽度为 r_1 ⁇,\ldots,r ⁇ L-1 ⁇ $达到零损最低值的R_1 ⁇ 美元最低值!\cdocks r ⁇ L-1 ⁇!美元偏僻点相互对调,我们发现每层多增加一个神经元足以将所有这些先前离散的微米连接成一个单一的柱形。对于一个双层超离散的网络,宽度直径直径为 r ⁇ h h=:我们明确描述全球迷你的多层:它由$(r ⁇,m)平方平方块的子空间空间至少以$连接在一起的美元为最低值。对于一个宽度的网络,我们确定一个直径直径的直径的直方形亚空间系统数为$(r,米)仅包含直径超空域值的临界点的直方形精度缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩的系统。

0

相关内容

子空间

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

【Google】神经辐射场，Neural Radiance Fields，74页ppt

专知会员服务

74+阅读 · 2021年5月28日

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Moufang Patterns and Geometry of Information

Arxiv

0+阅读 · 2021年7月15日

Nonparametric Statistical Inference via Metric Distribution Function in Metric Spaces

Arxiv

0+阅读 · 2021年7月15日

Lockout: Sparse Regularization of Neural Networks

Arxiv

0+阅读 · 2021年7月15日

Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group

Arxiv

0+阅读 · 2021年7月14日

Geometry and Generalization: Eigenvalues as predictors of where a network will fail to generalize

Arxiv

0+阅读 · 2021年7月13日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Invariance-Preserving Localized Activation Functions for Graph Neural Networks

Invariance-Preserving Localized Activation Functions for Graph Neural Networks

Arxiv

4+阅读 · 2019年11月5日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Neural Ordinary Differential Equations

Arxiv

6+阅读 · 2018年10月3日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

全局极小值

Neural Networks

相关VIP内容

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

【Google】神经辐射场，Neural Radiance Fields，74页ppt

专知会员服务

74+阅读 · 2021年5月28日

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Moufang Patterns and Geometry of Information

Arxiv

0+阅读 · 2021年7月15日

Nonparametric Statistical Inference via Metric Distribution Function in Metric Spaces

Arxiv

0+阅读 · 2021年7月15日

Lockout: Sparse Regularization of Neural Networks

Arxiv

0+阅读 · 2021年7月15日

Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group

Arxiv

0+阅读 · 2021年7月14日

Geometry and Generalization: Eigenvalues as predictors of where a network will fail to generalize

Arxiv

0+阅读 · 2021年7月13日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Invariance-Preserving Localized Activation Functions for Graph Neural Networks

Invariance-Preserving Localized Activation Functions for Graph Neural Networks

Arxiv

4+阅读 · 2019年11月5日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Neural Ordinary Differential Equations

Arxiv

6+阅读 · 2018年10月3日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员