高尺寸 SGD 的精确风险轨迹 (Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions) - 专知论文

会员服务 ·

0

SGD · 正则化项 · 通用动力公司 · Learning · 确切的 ·

2022 年 6 月 15 日

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

翻译：高尺寸 SGD 的精确风险轨迹

Courtney Paquette,Elliot Paquette,Ben Adlam,Jeffrey Pennington

from arxiv, arXiv admin note: text overlap with arXiv:2205.07069

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quadratic problems, worst-case analyses give an asymptotic convergence rate for SGD that is no better than full-batch gradient descent (GD), and the purported implicit regularization effects of SGD lack a precise explanation. In this work, we study the dynamics of multi-pass SGD on high-dimensional convex quadratics and establish an asymptotic equivalence to a stochastic differential equation, which we call homogenized stochastic gradient descent (HSGD), whose solutions we characterize explicitly in terms of a Volterra integral equation. These results yield precise formulas for the learning and risk trajectories, which reveal a mechanism of implicit conditioning that explains the efficiency of SGD relative to GD. We also prove that the noise from SGD negatively impacts generalization performance, ruling out the possibility of any type of implicit regularization in this context. Finally, we show how to adapt the HSGD formalism to include streaming SGD, which allows us to produce an exact prediction for the excess risk of multi-pass SGD relative to that of streaming SGD (bootstrap risk).

翻译：SGD的经验成功往往归功于其计算效率和有利的普及行为,但两者的效果都没有被很好地理解,而且脱钩仍然是个尚未解决的问题。即使在Convex二次曲线问题的简单设置中,最坏的个案分析也给出了SGD无症状的趋同率,这不比完全的梯度下降(GD)好,而SGD的所谓隐含的正规化效果缺乏准确的解释。在这项工作中,我们研究了高度 convex二次曲线上的多频谱 SGD的动态,并建立了与随机差异等式的无症状等同性,我们称之为同性相色梯度梯度下降(HSGD),我们用伏特拉整体等式来明确描述其解决方案的无症状趋同性趋同性趋同性趋同率。这些结果产生了学习和风险轨迹的精确公式,揭示了一种隐含调节机制,可以解释SGD相对于GD的效率。我们还证明,在高度二次曲线流中,从SGD的隐性变现性SGD流中,最终使SGD的SGD的正值变现过程产生一种负风险。

0

相关内容

SGD

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

69+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Li/CO2-O2和Li/CO2电池分级多孔NiO/石墨烯正极的构建及电化学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

新泛素化修饰因子对Hedgehog信号通路调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于伴随方程的河渠污染源反演模型及水污染优化控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非光滑神经网络动力学性质研究及其在优化中的应用

国家自然科学基金

1+阅读 · 2011年12月31日

HAT/HDAC失衡与乙酰化修饰异常：急性肺损伤炎症失控新机制

国家自然科学基金

0+阅读 · 2009年12月31日

一维ZnO纳米线的高选择性一氧化碳气敏传感器研究

国家自然科学基金

0+阅读 · 2009年12月31日

Transformers as Meta-Learners for Implicit Neural Representations

Arxiv

0+阅读 · 2022年8月4日

Do We Really Sample Right In Model-Based Diagnosis?

Arxiv

0+阅读 · 2022年8月4日

Implicit Neural Representations for Image Compression

Arxiv

0+阅读 · 2022年8月3日

TSEM: Temporally Weighted Spatiotemporal Explainable Neural Network for Multivariate Time Series

Arxiv

0+阅读 · 2022年8月3日

Randomization-based joint central limit theorem and efficient covariate adjustment in stratified $2^K$ factorial experiments

Arxiv

0+阅读 · 2022年8月3日

Stochastic Primal-Dual Three Operator Splitting with Arbitrary Sampling and Preconditioning

Stochastic Primal-Dual Three Operator Splitting with Arbitrary Sampling and Preconditioning

Arxiv

0+阅读 · 2022年8月2日

Implicit Two-Tower Policies

Arxiv

0+阅读 · 2022年8月2日

Updating Barcodes and Representatives for Zigzag Persistence

Arxiv

0+阅读 · 2022年8月1日

Numerical identification of initial temperatures in heat equation with dynamic boundary conditions

Arxiv

0+阅读 · 2022年8月1日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

19+阅读 · 2021年5月10日

VIP会员

文章信息

相关主题

通用动力公司

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

69+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

77+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Transformers as Meta-Learners for Implicit Neural Representations

Arxiv

0+阅读 · 2022年8月4日

Do We Really Sample Right In Model-Based Diagnosis?

Arxiv

0+阅读 · 2022年8月4日

Implicit Neural Representations for Image Compression

Arxiv

0+阅读 · 2022年8月3日

TSEM: Temporally Weighted Spatiotemporal Explainable Neural Network for Multivariate Time Series

Arxiv

0+阅读 · 2022年8月3日

Randomization-based joint central limit theorem and efficient covariate adjustment in stratified $2^K$ factorial experiments

Arxiv

0+阅读 · 2022年8月3日

Stochastic Primal-Dual Three Operator Splitting with Arbitrary Sampling and Preconditioning

Stochastic Primal-Dual Three Operator Splitting with Arbitrary Sampling and Preconditioning

Arxiv

0+阅读 · 2022年8月2日

Implicit Two-Tower Policies

Arxiv

0+阅读 · 2022年8月2日

Updating Barcodes and Representatives for Zigzag Persistence

Arxiv

0+阅读 · 2022年8月1日

Numerical identification of initial temperatures in heat equation with dynamic boundary conditions

Arxiv

0+阅读 · 2022年8月1日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

19+阅读 · 2021年5月10日

相关基金

Li/CO2-O2和Li/CO2电池分级多孔NiO/石墨烯正极的构建及电化学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

新泛素化修饰因子对Hedgehog信号通路调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于伴随方程的河渠污染源反演模型及水污染优化控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Tecto调节非洲爪蛙胚层决定与分化的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非光滑神经网络动力学性质研究及其在优化中的应用

国家自然科学基金

1+阅读 · 2011年12月31日

HAT/HDAC失衡与乙酰化修饰异常：急性肺损伤炎症失控新机制

国家自然科学基金

0+阅读 · 2009年12月31日

一维ZnO纳米线的高选择性一氧化碳气敏传感器研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员