深矩阵因素化的梯度源:动态和对低等级的隐含偏见 (Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank) - 专知论文

会员服务 ·

0

秩 · 有偏 · 因子分解 · Networking · 早停 ·

2021 年 3 月 19 日

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

翻译：深矩阵因素化的梯度源:动态和对低等级的隐含偏见

Hung-Hsu Chou,Carsten Gieshoff,Johannes Maly,Holger Rauhut

In many deep learning scenarios more network parameters than training examples are used. In such situations often several networks can be found that exactly interpolate the data. This means that the used learning algorithm induces an implicit bias on the chosen network. This paper aims at shedding some light on the nature of such implicit bias in a certain simpified setting of linear networks, i.e., deep matrix factorizations. We provide a rigorous analysis of the dynamics of vanilla gradient descent. We characterize the dynamical behaviour of ground-truth eigenvectors and convergence of the corresponding eigenvalues to the true ones. As a consequence, for exactly characterized time intervals, the effective rank of gradient descent iterates is provably close to the effective rank of a low-rank projection of the ground-truth matrix, such that early stopping of gradient descent produces regularized solutions that may be used for denoising, for instance. In particular, apart from few initial steps of the iterations, the effective rank of our matrix is monotonically increasing, suggesting that "matrix factorization implicitly enforces gradient descent to take a route in which the effective rank is monotone". Since empirical observations in more general scenarios such as matrix sensing show a similar phenomenon, we believe that our theoretical results help understanding the still mysterious "implicit bias" of gradient descent in deep learning.

翻译：在许多深层次的学习假设中,网络参数比培训实例要多一些。在这种情况下,往往可以发现一些网络,完全可以推断数据。这意味着,使用过的学习算法在选定的网络中产生隐含的偏差。本文的目的是在某种简化的线性网络设置中,即深层矩阵因子化,说明这种隐含的偏差的性质。我们严格分析香草梯度梯度下降的动态。我们描述的是地面真象的动态行为和相应的叶素值与真实值的趋同。因此,对于确切的定时间隔而言,所选的梯度梯度偏移值的有效等级接近于低水平的地面光线性矩阵预测的有效等级。这样,早期停止梯度下降会产生常规化的解决方案,可用于分解,例如。特别是,除了最初的迭变步骤之外,我们矩阵的有效等级是单质的,表明“矩阵隐含地梯度的梯度下降,以采取精确的阶梯度的阶梯度观察方法,我们仍认为,在一种相似的阶梯度矩阵中,我们一般的测测测测得的是,这种测得的底的梯性模型结果。

0

相关内容

【UC伯克利-清华】隐式图神经网络

专知会员服务

24+阅读 · 2020年9月15日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

【牛津大学Yee Whye Teh 】论深度学习中的统计思维（On Statistical Thinking in Deep Learning），附49页ppt

【牛津大学Yee Whye Teh 】论深度学习中的统计思维（On Statistical Thinking in Deep Learning），附49页ppt

专知会员服务

63+阅读 · 2019年11月24日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

样本贡献不均：Focal Loss和 Gradient Harmonizing Mechanism

样本贡献不均：Focal Loss和 Gradient Harmonizing Mechanism

极市平台

25+阅读 · 2019年4月25日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文】深度学习的数学解释

【论文】深度学习的数学解释

机器学习研究会

10+阅读 · 2017年12月15日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Arxiv

0+阅读 · 2021年5月13日

A Sharp Analysis of Covariate Adjusted Precision Matrix Estimation via Alternating Gradient Descent with Hard Thresholding

Arxiv

0+阅读 · 2021年5月10日

Equivalent formulations of the oxygen depletion problem, other implicit free boundary value problems, and implications for numerical approximation

Arxiv

0+阅读 · 2021年5月7日

On the Optimality of Nuclear-norm-based Matrix Completion for Problems with Smooth Non-linear Structure

Arxiv

0+阅读 · 2021年5月5日

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

Arxiv

0+阅读 · 2021年5月4日

Implicit Regularization in Deep Tensor Factorization

Arxiv

0+阅读 · 2021年5月4日

Recent advances in deep learning theory

Recent advances in deep learning theory

Arxiv

50+阅读 · 2020年12月20日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

VIP会员

文章信息

相关主题

相关VIP内容

【UC伯克利-清华】隐式图神经网络

专知会员服务

24+阅读 · 2020年9月15日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

【牛津大学Yee Whye Teh 】论深度学习中的统计思维（On Statistical Thinking in Deep Learning），附49页ppt

【牛津大学Yee Whye Teh 】论深度学习中的统计思维（On Statistical Thinking in Deep Learning），附49页ppt

专知会员服务

63+阅读 · 2019年11月24日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型基准综述

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

【剑桥博士论文】多智能体学习中的神经多样性

以色列-伊朗空战：短暂而激烈冲突的启示

相关资讯

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

样本贡献不均：Focal Loss和 Gradient Harmonizing Mechanism

样本贡献不均：Focal Loss和 Gradient Harmonizing Mechanism

极市平台

25+阅读 · 2019年4月25日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文】深度学习的数学解释

【论文】深度学习的数学解释

机器学习研究会

10+阅读 · 2017年12月15日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

相关论文

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

On the Explicit Role of Initialization on the Convergence and Implicit Bias of Overparametrized Linear Networks

Arxiv

0+阅读 · 2021年5月13日

A Sharp Analysis of Covariate Adjusted Precision Matrix Estimation via Alternating Gradient Descent with Hard Thresholding

Arxiv

0+阅读 · 2021年5月10日

Equivalent formulations of the oxygen depletion problem, other implicit free boundary value problems, and implications for numerical approximation

Arxiv

0+阅读 · 2021年5月7日

On the Optimality of Nuclear-norm-based Matrix Completion for Problems with Smooth Non-linear Structure

Arxiv

0+阅读 · 2021年5月5日

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

Arxiv

0+阅读 · 2021年5月4日

Implicit Regularization in Deep Tensor Factorization

Arxiv

0+阅读 · 2021年5月4日

Recent advances in deep learning theory

Recent advances in deep learning theory

Arxiv

50+阅读 · 2020年12月20日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

Arxiv

4+阅读 · 2019年5月9日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

微信扫码咨询专知VIP会员