Shift-Curtuty、SGD和普遍化 (Shift-Curvature, SGD, and Generalization) - 专知论文

会员服务 ·

0

SGD · 曲率 · Performer · 极小值 · 泛化理论 ·

2022 年 7 月 27 日

Shift-Curvature, SGD, and Generalization

翻译：Shift-Curtuty、SGD和普遍化

Arwen V. Bradley,Carlos Alberto Gomez-Uribe,Manish Reddy Vuyyuru

A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and that SGD discourages curvature. We offer a more complete and nuanced view in support of both. First, we show that curvature harms test performance through two new mechanisms, the shift-curvature and bias-curvature, in addition to a known parameter-covariance mechanism. The three curvature-mediated contributions to test performance are reparametrization-invariant although curvature is not. The shift in the shift-curvature is the line connecting train and test local minima, which differ due to dataset sampling or distribution shift. Although the shift is unknown at training time, the shift-curvature can still be mitigated by minimizing overall curvature. Second, we derive a new, explicit SGD steady-state distribution showing that SGD optimizes an effective potential related to but different from train loss, and that SGD noise mediates a trade-off between deep versus low-curvature regions of this effective potential. Third, combining our test performance analysis with the SGD steady state shows that for small SGD noise, the shift-curvature may be the most significant of the three mechanisms. Our experiments confirm the impact of shift-curvature on test loss, and further explore the relationship between SGD noise and curvature.

翻译：长期争论围绕的是相关的假设, 即低精度微缩缩图比较简单, 并且 SGD 不鼓励曲线。我们用更完整和细微的视角来支持两者。首先, 我们展示了曲线伤害测试性能的两种新机制: 除了已知的参数差异性能机制之外, 变化- 曲线和偏斜- 曲线, 以及已知的参数差异性能机制。对测试性能的三种曲线媒介贡献是再平衡- 差异性能, 虽然曲流不是曲流。转变是连接火车和测试本地微缩图的线, 但由于数据集抽样或分布的变化而有所不同。首先, 虽然在培训时尚不为人所知, 但转变性能- 曲线性能测试性能测试性能的测试性能表现仍然可以通过尽量减少总体曲线性能来减轻。第二, 我们推出一个新的、明确的 SGD 稳性能稳定状态分布显示, SGD 优化和 3 测试性能性能的测试性能变化, 显示我们测试性能性能性能的测试性能性能性能性能性能性能性能性能的性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能性能

0

相关内容

SGD

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

铁基双金属/石墨烯的制备及其吸附与可见光Fenton降解染料的性能和机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于ADP/ATP运载蛋白调控香蕉果实后熟的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

适用于无线传感器网络SOC的低功耗低成本SAR型A/D转换器设计研究

国家自然科学基金

0+阅读 · 2013年12月31日

"超薄超导/石墨烯"的输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

棉花中一个成花素同源基因GhFTL1调节开花的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米杂化双光子吸收无机功能材料的构筑与机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

矿井受限空间瓦斯爆炸演化过程的化学反应动力学机理与致灾机制

国家自然科学基金

0+阅读 · 2011年12月31日

温敏性无机/高分子核壳结构纳米粒子的制备与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

A Novel Dataset for Evaluating and Alleviating Domain Shift for Human Detection in Agricultural Fields

Arxiv

0+阅读 · 2022年9月27日

Targeted Separation and Convergence with Kernel Discrepancies

Arxiv

0+阅读 · 2022年9月26日

Renewable Composite Quantile Method and Algorithm for Nonparametric Models with Streaming Data

Arxiv

0+阅读 · 2022年9月26日

Run Time Analysis for Random Local Search on Generalized Majority Functions

Arxiv

0+阅读 · 2022年9月26日

On Variance Estimation of Random Forests

Arxiv

0+阅读 · 2022年9月26日

Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification

Arxiv

0+阅读 · 2022年9月25日

Online Score Statistics for Detecting Clustered Change in Network Point Processes

Arxiv

0+阅读 · 2022年9月25日

Invariant Information Bottleneck for Domain Generalization

Invariant Information Bottleneck for Domain Generalization

Arxiv

15+阅读 · 2021年12月10日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

A Novel Dataset for Evaluating and Alleviating Domain Shift for Human Detection in Agricultural Fields

Arxiv

0+阅读 · 2022年9月27日

Targeted Separation and Convergence with Kernel Discrepancies

Arxiv

0+阅读 · 2022年9月26日

Renewable Composite Quantile Method and Algorithm for Nonparametric Models with Streaming Data

Arxiv

0+阅读 · 2022年9月26日

Run Time Analysis for Random Local Search on Generalized Majority Functions

Arxiv

0+阅读 · 2022年9月26日

On Variance Estimation of Random Forests

Arxiv

0+阅读 · 2022年9月26日

Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification

Arxiv

0+阅读 · 2022年9月25日

Online Score Statistics for Detecting Clustered Change in Network Point Processes

Arxiv

0+阅读 · 2022年9月25日

Invariant Information Bottleneck for Domain Generalization

Invariant Information Bottleneck for Domain Generalization

Arxiv

15+阅读 · 2021年12月10日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

铁基双金属/石墨烯的制备及其吸附与可见光Fenton降解染料的性能和机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

基于ADP/ATP运载蛋白调控香蕉果实后熟的机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

适用于无线传感器网络SOC的低功耗低成本SAR型A/D转换器设计研究

国家自然科学基金

0+阅读 · 2013年12月31日

"超薄超导/石墨烯"的输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

棉花中一个成花素同源基因GhFTL1调节开花的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米杂化双光子吸收无机功能材料的构筑与机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

矿井受限空间瓦斯爆炸演化过程的化学反应动力学机理与致灾机制

国家自然科学基金

0+阅读 · 2011年12月31日

温敏性无机/高分子核壳结构纳米粒子的制备与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员