The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent - 专知论文

会员服务 ·

0

正则化项 · SGD · 通用动力公司 · 随机梯度下降 · 泛化理论 ·

2023 年 5 月 27 日

The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent

翻译：暂无翻译

Lei Wu,Weijie J. Su

from arxiv, ICML 2023 camera ready

In this paper, we study the implicit regularization of stochastic gradient descent (SGD) through the lens of {\em dynamical stability} (Wu et al., 2018). We start by revising existing stability analyses of SGD, showing how the Frobenius norm and trace of Hessian relate to different notions of stability. Notably, if a global minimum is linearly stable for SGD, then the trace of Hessian must be less than or equal to $2/\eta$, where $\eta$ denotes the learning rate. By contrast, for gradient descent (GD), the stability imposes a similar constraint but only on the largest eigenvalue of Hessian. We then turn to analyze the generalization properties of these stable minima, focusing specifically on two-layer ReLU networks and diagonal linear networks. Notably, we establish the {\em equivalence} between these metrics of sharpness and certain parameter norms for the two models, which allows us to show that the stable minima of SGD provably generalize well. By contrast, the stability-induced regularization of GD is provably too weak to ensure satisfactory generalization. This discrepancy provides an explanation of why SGD often generalizes better than GD. Note that the learning rate (LR) plays a pivotal role in the strength of stability-induced regularization. As the LR increases, the regularization effect becomes more pronounced, elucidating why SGD with a larger LR consistently demonstrates superior generalization capabilities. Additionally, numerical experiments are provided to support our theoretical findings.

翻译：暂无翻译

0

相关内容

正则化项

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

多伦多大学Fall2020《机器学习导论》课程，不可错过！

专知会员服务

55+阅读 · 2020年10月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

组蛋白H3K4甲基转移酶Mll2/Wbp7参与造血免疫组织发育分化的功能与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PRC1在胚胎干细胞生长和分化中识别和抑制靶基因的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

水牛植入前胚胎体外培养过程中抗氧化损伤机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

用于生长因子类药物生殖发育毒性评价的小鼠胚胎干细胞特异性分子标记物的筛选

国家自然科学基金

0+阅读 · 2009年12月31日

βatenin在胚胎干细胞自我更新中的作用及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Unveiling Bias in Sequential Decision Making: A Causal Inference Approach for Stochastic Service Systems

Arxiv

0+阅读 · 2023年7月15日

The Interpolating Information Criterion for Overparameterized Models

Arxiv

0+阅读 · 2023年7月15日

Causal Regularization: On the trade-off between in-sample risk and out-of-sample risk guarantees

Arxiv

0+阅读 · 2023年7月14日

Bounded-memory adjusted scores estimation in generalized linear models with large data sets

Arxiv

0+阅读 · 2023年7月14日

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Arxiv

0+阅读 · 2023年7月13日

Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent

Arxiv

0+阅读 · 2023年7月13日

Accelerated stochastic approximation with state-dependent noise

Arxiv

0+阅读 · 2023年7月13日

On the Effective Horizon of Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年7月13日

Federated Causal Inference in Heterogeneous Observational Data

Arxiv

24+阅读 · 2021年8月10日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

VIP会员

文章信息

相关主题

通用动力公司

随机梯度下降

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

多伦多大学Fall2020《机器学习导论》课程，不可错过！

专知会员服务

55+阅读 · 2020年10月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Unveiling Bias in Sequential Decision Making: A Causal Inference Approach for Stochastic Service Systems

Arxiv

0+阅读 · 2023年7月15日

The Interpolating Information Criterion for Overparameterized Models

Arxiv

0+阅读 · 2023年7月15日

Causal Regularization: On the trade-off between in-sample risk and out-of-sample risk guarantees

Arxiv

0+阅读 · 2023年7月14日

Bounded-memory adjusted scores estimation in generalized linear models with large data sets

Arxiv

0+阅读 · 2023年7月14日

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Arxiv

0+阅读 · 2023年7月13日

Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent

Arxiv

0+阅读 · 2023年7月13日

Accelerated stochastic approximation with state-dependent noise

Arxiv

0+阅读 · 2023年7月13日

On the Effective Horizon of Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年7月13日

Federated Causal Inference in Heterogeneous Observational Data

Arxiv

24+阅读 · 2021年8月10日

Meta-Learning with Implicit Gradients

Meta-Learning with Implicit Gradients

Arxiv

13+阅读 · 2019年9月10日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

组蛋白H3K4甲基转移酶Mll2/Wbp7参与造血免疫组织发育分化的功能与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PRC1在胚胎干细胞生长和分化中识别和抑制靶基因的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

水牛植入前胚胎体外培养过程中抗氧化损伤机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

用于生长因子类药物生殖发育毒性评价的小鼠胚胎干细胞特异性分子标记物的筛选

国家自然科学基金

0+阅读 · 2009年12月31日

βatenin在胚胎干细胞自我更新中的作用及其机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员