体重正常化的内隐性规范化和趋同 (Implicit Regularization and Convergence for Weight Normalization) - 专知论文

会员服务 ·

0

权重规范化 · Weight · 正则化项 · 规范化的 · 极小点 ·

2020 年 10 月 19 日

Implicit Regularization and Convergence for Weight Normalization

翻译：体重正常化的内隐性规范化和趋同

Xiaoxia Wu,Edgar Dobriban,Tongzheng Ren,Shanshan Wu,Zhiyuan Li,Suriya Gunasekar,Rachel Ward,Qiang Liu

from arxiv, NeurIPS 2020

Normalization methods such as batch [Ioffe and Szegedy, 2015], weight [Salimansand Kingma, 2016], instance [Ulyanov et al., 2016], and layer normalization [Baet al., 2016] have been widely used in modern machine learning. Here, we study the weight normalization (WN) method [Salimans and Kingma, 2016] and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least-squares regression. WN and rPGD reparametrize the weights with a scale g and a unit vector w and thus the objective function becomes non-convex. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and converge close to the minimum l2 norm solution, even for initializations far from zero. For certain stepsizes of g and w , we show that they can converge close to the minimum norm solution. This is different from the behavior of gradient descent, which converges to the minimum norm solution only when started at a point in the range space of the feature matrix, and is thus more sensitive to initialization.

翻译：在现代机器学习中广泛使用了分批[Ioffe和Szegedy,2015年]、重量[Salimansand Kingma,2016年]、例如[Ulyanov等人,2016年]和层正常化[Baet al,2016年]等正常化方法。在这里,我们研究了重量正常化方法[Salimans和Kingma,2016年]和称为超平衡最低平方回归的重新平衡预测梯度梯度下降(rPGD)的变异。WN和RPGD用比例g和单位矢量调整重量,从而使目标函数成为非convex。我们表明,这种非convex的正规化配方与梯度下降在原始目标上具有有利的效果。这些方法调整了重量,并接近于最低纬度标准值解决方案,即使初始化时间远为零。关于g和w的分级,我们表明它们可以接近最低限度的规范解决方案。这与梯度下降和单位矢量行为不同,因此,在初始标准解决方案中,只有开始时,才会接近于空间基质下降和基质特性。

0

相关内容

权重规范化

权重规范化

【NeurIPS 2020 】面向张量分解知识图谱补全的对偶诱导正则

【NeurIPS 2020 】面向张量分解知识图谱补全的对偶诱导正则

专知会员服务

12+阅读 · 2020年11月17日

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

专知会员服务

22+阅读 · 2020年11月13日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【清华大学】诊断和增强VAE模型，Diagnosing and Enhancing VAE Models

【清华大学】诊断和增强VAE模型，Diagnosing and Enhancing VAE Models

专知会员服务

37+阅读 · 2020年2月27日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【NeurIPS2019】基于累加噪声的对抗鲁棒性（Certified Adversarial Robustness with Additive Noise），Changyou Chen

【NeurIPS2019】基于累加噪声的对抗鲁棒性（Certified Adversarial Robustness with Additive Noise），Changyou Chen

专知会员服务

36+阅读 · 2019年11月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

过参数化、剪枝和网络结构搜索

过参数化、剪枝和网络结构搜索

极市平台

17+阅读 · 2019年11月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

模型大小缩小75%，TensorFlow推出模型优化工具包

模型大小缩小75%，TensorFlow推出模型优化工具包

论智

7+阅读 · 2018年9月19日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【干货】Batch Normalization: 如何更快地训练深度神经网络

【干货】Batch Normalization: 如何更快地训练深度神经网络

专知

13+阅读 · 2018年3月6日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

Implicit Regularization in ReLU Networks with the Square Loss

Implicit Regularization in ReLU Networks with the Square Loss

Arxiv

0+阅读 · 2020年12月9日

Implicit bias with Ritz-Galerkin method in understanding deep learning for solving PDEs

Implicit bias with Ritz-Galerkin method in understanding deep learning for solving PDEs

Arxiv

0+阅读 · 2020年12月9日

Stochastic Gradient Descent with Large Learning Rate

Arxiv

0+阅读 · 2020年12月7日

Convergence of block coordinate descent with diminishing radius for nonconvex optimization

Arxiv

0+阅读 · 2020年12月7日

A non-convex regularization approach for stable estimation of loss development factors

Arxiv

0+阅读 · 2020年12月7日

Contrastive Divergence Learning is a Time Reversal Adversarial Game

Arxiv

0+阅读 · 2020年12月6日

On the linearity of large non-linear models: when and why the tangent kernel is constant

Arxiv

0+阅读 · 2020年12月5日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

VIP会员

文章信息

相关主题

权重规范化

相关VIP内容

【NeurIPS 2020 】面向张量分解知识图谱补全的对偶诱导正则

【NeurIPS 2020 】面向张量分解知识图谱补全的对偶诱导正则

专知会员服务

12+阅读 · 2020年11月17日

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

专知会员服务

22+阅读 · 2020年11月13日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【清华大学】诊断和增强VAE模型，Diagnosing and Enhancing VAE Models

【清华大学】诊断和增强VAE模型，Diagnosing and Enhancing VAE Models

专知会员服务

37+阅读 · 2020年2月27日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【NeurIPS2019】基于累加噪声的对抗鲁棒性（Certified Adversarial Robustness with Additive Noise），Changyou Chen

【NeurIPS2019】基于累加噪声的对抗鲁棒性（Certified Adversarial Robustness with Additive Noise），Changyou Chen

专知会员服务

36+阅读 · 2019年11月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

过参数化、剪枝和网络结构搜索

过参数化、剪枝和网络结构搜索

极市平台

17+阅读 · 2019年11月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

模型大小缩小75%，TensorFlow推出模型优化工具包

模型大小缩小75%，TensorFlow推出模型优化工具包

论智

7+阅读 · 2018年9月19日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【干货】Batch Normalization: 如何更快地训练深度神经网络

【干货】Batch Normalization: 如何更快地训练深度神经网络

专知

13+阅读 · 2018年3月6日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

相关论文

Implicit Regularization in ReLU Networks with the Square Loss

Implicit Regularization in ReLU Networks with the Square Loss

Arxiv

0+阅读 · 2020年12月9日

Implicit bias with Ritz-Galerkin method in understanding deep learning for solving PDEs

Implicit bias with Ritz-Galerkin method in understanding deep learning for solving PDEs

Arxiv

0+阅读 · 2020年12月9日

Stochastic Gradient Descent with Large Learning Rate

Arxiv

0+阅读 · 2020年12月7日

Convergence of block coordinate descent with diminishing radius for nonconvex optimization

Arxiv

0+阅读 · 2020年12月7日

A non-convex regularization approach for stable estimation of loss development factors

Arxiv

0+阅读 · 2020年12月7日

Contrastive Divergence Learning is a Time Reversal Adversarial Game

Arxiv

0+阅读 · 2020年12月6日

On the linearity of large non-linear models: when and why the tangent kernel is constant

Arxiv

0+阅读 · 2020年12月5日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

微信扫码咨询专知VIP会员