全球最低全球最小化器、严格和非限制性马鞍点以及深线神经网络的隐含正规化 (Global minimizers, strict and non-strict saddle points, and implicit regularization for deep linear neural networks) - 专知论文

会员服务 ·

0

正则化项 · Neural Networks · 线性的 · 极大值 · 极小值 ·

2021 年 7 月 28 日

Global minimizers, strict and non-strict saddle points, and implicit regularization for deep linear neural networks

翻译：全球最低全球最小化器、严格和非限制性马鞍点以及深线神经网络的隐含正规化

El Mehdi Achour,François Malgouyres,Sébastien Gerchinovitz

In non-convex settings, it is established that the behavior of gradient-based algorithms is different in the vicinity of local structures of the objective function such as strict and non-strict saddle points, local and global minima and maxima. It is therefore crucial to describe the landscape of non-convex problems. That is, to describe as well as possible the set of points of each of the above categories. In this work, we study the landscape of the empirical risk associated with deep linear neural networks and the square loss. It is known that, under weak assumptions, this objective function has no spurious local minima and no local maxima. We go a step further and characterize, among all critical points, which are global minimizers, strict saddle points, and non-strict saddle points. We enumerate all the associated critical values. The characterization is simple, involves conditions on the ranks of partial matrix products, and sheds some light on global convergence or implicit regularization that have been proved or observed when optimizing a linear neural network. In passing, we also provide an explicit parameterization of the set of all global minimizers and exhibit large sets of strict and non-strict saddle points.

翻译：在非convex 设置中,确定基于梯度的算法行为在目标功能的当地结构附近,例如严格和非严格的马鞍点、地方和全球小型和小型及大型马车等附近是不同的,因此,描述非康韦克斯问题的背景至关重要。也就是说,描述和可能地描述上述每一类的一组点。在这项工作中,我们研究了与深线性神经网络和平方损失有关的实证风险的场景。众所周知,在虚弱的假设下,这一目标功能没有虚假的当地小型和本地最高标准。我们进一步确定所有关键点,包括全球最小化器、严格马鞍点和非严格马车点。我们罗列所有相关的关键值。特征简单,涉及部分矩阵产品系列的条件,并在一定程度上说明在优化线性神经网络时已经证明或观察到的全球趋同或隐含的规范化。顺便说,我们还对所有全球最小化器的组合作了明确的参数化,并展示了大量的严格和不严格的固定点。

0

相关内容

正则化项

【KDD2021】图神经网络，NUS- Xavier Bresson教授

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

66+阅读 · 2021年8月20日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【论文推荐】深度学习中贝叶斯不确定性简单基线（A simple baseline for bayesian uncertainty in deep learning）

【论文推荐】深度学习中贝叶斯不确定性简单基线（A simple baseline for bayesian uncertainty in deep learning）

专知会员服务

46+阅读 · 2019年12月25日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

3+阅读 · 2019年4月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

极市平台

5+阅读 · 2017年6月15日

Revisiting minimum description length complexity in overparameterized models

Arxiv

0+阅读 · 2021年9月27日

Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling

Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling

Arxiv

0+阅读 · 2021年9月27日

Groups Influence with Minimum Cost in Social Networks

Arxiv

0+阅读 · 2021年9月25日

A general alternating-direction implicit framework with Gaussian process regression parameter prediction for large sparse linear systems

Arxiv

0+阅读 · 2021年9月25日

Statistical Learning using Sparse Deep Neural Networks in Empirical Risk Minimization

Arxiv

0+阅读 · 2021年9月24日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Products of Euclidean metrics and applications to proximity questions among curves

Arxiv

3+阅读 · 2020年4月13日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

(FPT-)Approximation Algorithms for the Virtual Network Embedding Problem

Arxiv

4+阅读 · 2018年3月12日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【KDD2021】图神经网络，NUS- Xavier Bresson教授

【KDD2021】图神经网络，NUS- Xavier Bresson教授

专知会员服务

66+阅读 · 2021年8月20日

《算法凸几何》简明书，Algorithmic Convex Geometry，50页pdf

专知会员服务

42+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【论文推荐】深度学习中贝叶斯不确定性简单基线（A simple baseline for bayesian uncertainty in deep learning）

【论文推荐】深度学习中贝叶斯不确定性简单基线（A simple baseline for bayesian uncertainty in deep learning）

专知会员服务

46+阅读 · 2019年12月25日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身操作的视觉-语言-动作模型综述

《多域空战指挥体系：驾驭复杂性的艺术》

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

3+阅读 · 2019年4月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

【简评】[CVPR2017]Loss Max-Pooling for Semantic Image Segmentation

极市平台

5+阅读 · 2017年6月15日

相关论文

Revisiting minimum description length complexity in overparameterized models

Arxiv

0+阅读 · 2021年9月27日

Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling

Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling

Arxiv

0+阅读 · 2021年9月27日

Groups Influence with Minimum Cost in Social Networks

Arxiv

0+阅读 · 2021年9月25日

A general alternating-direction implicit framework with Gaussian process regression parameter prediction for large sparse linear systems

Arxiv

0+阅读 · 2021年9月25日

Statistical Learning using Sparse Deep Neural Networks in Empirical Risk Minimization

Arxiv

0+阅读 · 2021年9月24日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Products of Euclidean metrics and applications to proximity questions among curves

Arxiv

3+阅读 · 2020年4月13日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

(FPT-)Approximation Algorithms for the Virtual Network Embedding Problem

Arxiv

4+阅读 · 2018年3月12日

微信扫码咨询专知VIP会员