全球最小化器 (Label Noise SGD Provably Prefers Flat Global Minimizers) - 专知论文

会员服务 ·

0

SGD · 正则化项 · 全局极小解 · 噪声 · 标注 ·

2021 年 6 月 11 日

Label Noise SGD Provably Prefers Flat Global Minimizers

翻译：全球最小化器

Alex Damian,Tengyu Ma,Jason Lee

from arxiv, 53 pages, 4 figures, under review for NeurIPS 2021

In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to a stationary point of a regularized loss $L(\theta) +\lambda R(\theta)$, where $L(\theta)$ is the training loss, $\lambda$ is an effective regularization parameter depending on the step size, strength of the label noise, and the batch size, and $R(\theta)$ is an explicit regularizer that penalizes sharp minimizers. Our analysis uncovers an additional regularization effect of large learning rates beyond the linear scaling rule that penalizes large eigenvalues of the Hessian more than small ones. We also prove extensions to classification with general loss functions, SGD with momentum, and SGD with general noise covariance, significantly strengthening the prior work of Blanc et al. to global convergence and large learning rates and of HaoChen et al. to general models.

翻译：在过度平衡的模型中,悬浮梯度下沉的噪音暗含地使优化轨迹规范化,并确定哪些地方最低的SGD(SGD)相交。受经验研究的激励,这些研究显示,用噪音标签进行的培训能够改善一般化。我们研究SGD隐含的规范化作用,用标签噪音研究SGD(SGD)隐含的规范化效应。我们的分析发现,标签噪音使SGD与正常化损失的固定点($L(theta) ⁇ lambda R(theta) 美元($)相交汇,而美元是培训损失的固定点($L(theta) ) 。美元是培训损失的固定点, 美元是当地最低SGD($) 。美元是有效的规范化参数,取决于步骤大小、标签噪音强度和批量规模,而美元(Rhetta) 是明确的常规规范化因素,惩罚锐化最小化的最小化者。我们的分析发现,除了线级缩标规则外,大型学习率比全球大的趋同率外,还有另一个的大型学习率。

0

相关内容

SGD

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

手写实现李航《统计学习方法》书中全部算法

专知会员服务

142+阅读 · 2020年5月19日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

【自监督学习新成果】基于对比预测编码的数据高效图像识别（Data-Efficient Image Recognition with Contrastive Predictive Coding）

【自监督学习新成果】基于对比预测编码的数据高效图像识别（Data-Efficient Image Recognition with Contrastive Predictive Coding）

专知会员服务

16+阅读 · 2019年12月10日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

The Benefits of Implicit Regularization from SGD in Least Squares Problems

Arxiv

0+阅读 · 2021年8月10日

Effect of stepwise adjustment of Damping factor upon PageRank

Effect of stepwise adjustment of Damping factor upon PageRank

Arxiv

0+阅读 · 2021年8月9日

On the Hyperparameters in Stochastic Gradient Descent with Momentum

Arxiv

0+阅读 · 2021年8月9日

Iterative Pre-Conditioning for Expediting the Gradient-Descent Method: The Distributed Linear Least-Squares Problem

Arxiv

0+阅读 · 2021年8月6日

Interpolation can hurt robust generalization even when there is no noise

Arxiv

0+阅读 · 2021年8月5日

A gradient method for inconsistency reduction of pairwise comparisons matrices

Arxiv

0+阅读 · 2021年8月4日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Piecewise Flat Embedding for Image Segmentation

Arxiv

3+阅读 · 2018年5月20日

Inference Suboptimality in Variational Autoencoders

Arxiv

3+阅读 · 2018年1月10日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

全局极小解

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

手写实现李航《统计学习方法》书中全部算法

专知会员服务

142+阅读 · 2020年5月19日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

【自监督学习新成果】基于对比预测编码的数据高效图像识别（Data-Efficient Image Recognition with Contrastive Predictive Coding）

【自监督学习新成果】基于对比预测编码的数据高效图像识别（Data-Efficient Image Recognition with Contrastive Predictive Coding）

专知会员服务

16+阅读 · 2019年12月10日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身智能的多模态数据存储与检索：综述

《算法战争研究计划全景评估》35页

【CMU博士论文】水下三维视觉感知与生成

智能体战争：自主人工智能军备竞赛全景透视

相关资讯

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec 精选：基于参数共享的CNN-RNN混合模型

LibRec智能推荐

6+阅读 · 2019年3月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

The Benefits of Implicit Regularization from SGD in Least Squares Problems

Arxiv

0+阅读 · 2021年8月10日

Effect of stepwise adjustment of Damping factor upon PageRank

Effect of stepwise adjustment of Damping factor upon PageRank

Arxiv

0+阅读 · 2021年8月9日

On the Hyperparameters in Stochastic Gradient Descent with Momentum

Arxiv

0+阅读 · 2021年8月9日

Iterative Pre-Conditioning for Expediting the Gradient-Descent Method: The Distributed Linear Least-Squares Problem

Arxiv

0+阅读 · 2021年8月6日

Interpolation can hurt robust generalization even when there is no noise

Arxiv

0+阅读 · 2021年8月5日

A gradient method for inconsistency reduction of pairwise comparisons matrices

Arxiv

0+阅读 · 2021年8月4日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Piecewise Flat Embedding for Image Segmentation

Arxiv

3+阅读 · 2018年5月20日

Inference Suboptimality in Variational Autoencoders

Arxiv

3+阅读 · 2018年1月10日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员