SGD 何时偏向于平板迷你? 通过线性稳定性进行定量定性 (When does SGD favor flat minima? A quantitative characterization via linear stability) - 专知论文

会员服务 ·

0

SGD · 平坦最小值 · 极小值 · Frobenius 范数 · 线性的 ·

2022 年 7 月 6 日

When does SGD favor flat minima? A quantitative characterization via linear stability

翻译：SGD 何时偏向于平板迷你? 通过线性稳定性进行定量定性

Lei Wu,Mingze Wang,Weijie Su

The observation that stochastic gradient descent (SGD) favors flat minima has played a fundamental role in understanding implicit regularization of SGD and guiding the tuning of hyperparameters. In this paper, we provide a quantitative explanation of this striking phenomenon by relating the particular noise structure of SGD to its \emph{linear stability} (Wu et al., 2018). Specifically, we consider training over-parameterized models with square loss. We prove that if a global minimum $\theta^*$ is linearly stable for SGD, then it must satisfy $\|H(\theta^*)\|_F\leq O(\sqrt{B}/\eta)$, where $\|H(\theta^*)\|_F, B,\eta$ denote the Frobenius norm of Hessian at $\theta^*$, batch size, and learning rate, respectively. Otherwise, SGD will escape from that minimum \emph{exponentially} fast. Hence, for minima accessible to SGD, the flatness -- as measured by the Frobenius norm of the Hessian -- is bounded independently of the model size and sample size. The key to obtaining these results is exploiting the particular geometry awareness of SGD noise: 1) the noise magnitude is proportional to loss value; 2) the noise directions concentrate in the sharp directions of local landscape. This property of SGD noise provably holds for linear networks and random feature models (RFMs) and is empirically verified for nonlinear networks. Moreover, the validity and practical relevance of our theoretical findings are justified by extensive numerical experiments.

翻译：SGD(SGD) 偏向于平底梯度梯度下降(SGD) 的观察在理解 SGD 隐含的正规化和指导超参数调整方面发挥了根本作用。在本文中,我们将SGD的特殊噪音结构与其 emph{线性稳定性(Wu 等人, 2018年) 联系起来, 以此从数量上解释这一惊人现象。具体地说, 我们考虑以平方损失来培训超分度模型。否则, SGD 将摆脱SGD的最小值 emph{线性稳定, 然后它必须满足 $H(theta ⁇ ) ⁇ F\leq O(sqrt{B}}/\eta) $($h(theta ⁇ ) F) {B, B,\\\\\\\ $(eta美元) 表示Hesian 的Frobenius标准, 美元、批量和学习率率。否则, SGDGD将快速地基值的精确值模型(根据Frobenal roberalal rolalalalalalalalal reck Stal) roal deal deal deal deal deal deal deal deal deal deal deal deal deal deald) exmal deal deal deal deal deal deal deal deal deal deald ex ex ex ex deal deal deal deal deal deal deal dex exm smal deal deal dealse, ex ex ex exm ex ex exm exm exm exm exms exm exm exm exm exm exm ex ex ex ex exm ex ex exm exm exmal deal deal deal deal deal deal deald ex ex exm exm exmmm exm exm ex ex ex ex ex ex ex ex ex exmmmmmalse ex ex ex ex ex ex ex ex

0

相关内容

SGD

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

高维模糊数值函数分析学、模糊凸分析与优化理论

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

高孔隙率氧化石墨烯宏观气凝胶对典型抗生素类药物吸附机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

磷灰石族矿物固溶体的溶解热力学及其对有毒微量元素在固液体系中分配的影响

国家自然科学基金

0+阅读 · 2012年12月31日

Saccharomyces cerevisiae NJWGYH30566产赤藓糖醇的辅酶工程及调控机理

国家自然科学基金

0+阅读 · 2011年12月31日

类群问题和数论运用

国家自然科学基金

1+阅读 · 2011年12月31日

广义Hamilton体系下粘性流体的保结构算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Automatic Synthesis of Random Generators for Numerically Constrained Algebraic Recursive Types

Arxiv

0+阅读 · 2022年8月26日

Optimal Damping with Hierarchical Adaptive Quadrature for Efficient Fourier Pricing of Multi-Asset Options in Lévy Models

Arxiv

0+阅读 · 2022年8月25日

Calibrated Selective Classification

Arxiv

0+阅读 · 2022年8月25日

Time Series Clustering with an EM algorithm for Mixtures of Linear Gaussian State Space Models

Arxiv

0+阅读 · 2022年8月25日

AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time

Arxiv

0+阅读 · 2022年8月24日

A coherence parameter characterizing generative compressed sensing with Fourier measurements

Arxiv

0+阅读 · 2022年8月24日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Overcoming Catastrophic Forgetting in Graph Neural Networks

Arxiv

14+阅读 · 2020年12月10日

Directional Graph Networks

Directional Graph Networks

Arxiv

27+阅读 · 2020年12月10日

VIP会员

文章信息

相关主题

平坦最小值

Frobenius 范数

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Automatic Synthesis of Random Generators for Numerically Constrained Algebraic Recursive Types

Arxiv

0+阅读 · 2022年8月26日

Optimal Damping with Hierarchical Adaptive Quadrature for Efficient Fourier Pricing of Multi-Asset Options in Lévy Models

Arxiv

0+阅读 · 2022年8月25日

Calibrated Selective Classification

Arxiv

0+阅读 · 2022年8月25日

Time Series Clustering with an EM algorithm for Mixtures of Linear Gaussian State Space Models

Arxiv

0+阅读 · 2022年8月25日

AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time

Arxiv

0+阅读 · 2022年8月24日

A coherence parameter characterizing generative compressed sensing with Fourier measurements

Arxiv

0+阅读 · 2022年8月24日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Overcoming Catastrophic Forgetting in Graph Neural Networks

Arxiv

14+阅读 · 2020年12月10日

Directional Graph Networks

Directional Graph Networks

Arxiv

27+阅读 · 2020年12月10日

相关基金

高维模糊数值函数分析学、模糊凸分析与优化理论

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

高孔隙率氧化石墨烯宏观气凝胶对典型抗生素类药物吸附机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

磷灰石族矿物固溶体的溶解热力学及其对有毒微量元素在固液体系中分配的影响

国家自然科学基金

0+阅读 · 2012年12月31日

Saccharomyces cerevisiae NJWGYH30566产赤藓糖醇的辅酶工程及调控机理

国家自然科学基金

0+阅读 · 2011年12月31日

类群问题和数论运用

国家自然科学基金

1+阅读 · 2011年12月31日

广义Hamilton体系下粘性流体的保结构算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员