微批量动量轨迹轨迹:高维度的批量大小饱和和融合 (Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions) - 专知论文

会员服务 ·

0

动量 · Batch Size · Learning · 可辨认的 · 饱和 ·

2022 年 6 月 2 日

Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions

翻译：微批量动量轨迹轨迹:高维度的批量大小饱和和融合

Kiwon Lee,Andrew N. Cheng,Courtney Paquette,Elliot Paquette

We analyze the dynamics of large batch stochastic gradient descent with momentum (SGD+M) on the least squares problem when both the number of samples and dimensions are large. In this setting, we show that the dynamics of SGD+M converge to a deterministic discrete Volterra equation as dimension increases, which we analyze. We identify a stability measurement, the implicit conditioning ratio (ICR), which regulates the ability of SGD+M to accelerate the algorithm. When the batch size exceeds this ICR, SGD+M converges linearly at a rate of $\mathcal{O}(1/\sqrt{\kappa})$, matching optimal full-batch momentum (in particular performing as well as a full-batch but with a fraction of the size). For batch sizes smaller than the ICR, in contrast, SGD+M has rates that scale like a multiple of the single batch SGD rate. We give explicit choices for the learning rate and momentum parameter in terms of the Hessian spectra that achieve this performance.

翻译：我们分析了在样本数量和尺寸都很大的情况下,大批量随机梯度下降(SGD+M)在最小方位问题上的动态。在此环境下, 我们显示 SGD+M 的动态随着尺寸的增加, 与确定性离散伏特拉方程式相融合。我们分析的是, 我们确定一个稳定性测量, 隐含的调节率( ICR), 以调节SGD+M 加速算法的能力。当批量数量超过该ICR时, SGD+M 则以$\mathcal{O}( 1/\ sqrt ~ kapapa} 以$( 1/\ sqrt~ kapa} ) 的线性趋近, 匹配最佳全批量动力( 特别是表演和整批量, 但尺寸的一小部分) 。而对于小于ICR的批量, SGD+M 的速率则像单批量SGDD率的倍。我们给出了实现这一效果的赫森光谱的学习率和动力参数的明确选择。

0

相关内容

动量方法 (Polyak, 1964) 旨在加速学习，特别是处理高曲率、小但一致的梯度，或是带噪声的梯度。动量算法积累了之前梯度指数级衰减的移动平均，并且继续沿该方向移动。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

92+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

TV-miR-200b/c靶向抑制HER2/HER3克服乳腺癌对赫赛汀耐药

国家自然科学基金

0+阅读 · 2014年12月31日

Cell-in-cell介导非易感细胞病毒感染及其免疫逃逸机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

海洋弧菌菌群感应信号分子N-acyl homoserine lactones对NK细胞的调控作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

低频磁场对红曲菌次生代谢的调控效应及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

高k材料MOSFET沟道电子迁移率的增强研究

国家自然科学基金

0+阅读 · 2012年12月31日

多孔介质中的Brinkman-Forchheimer方程解的稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

量子散射中的异常现象、Levinson 定理及其它

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

Ga、Al、In氮化物及其合金和径向异质结纳米线的可控制备和物性研究

国家自然科学基金

0+阅读 · 2008年12月31日

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Arxiv

0+阅读 · 2022年7月19日

Lazy Estimation of Variable Importance for Large Neural Networks

Arxiv

0+阅读 · 2022年7月19日

Is the Classic Convex Decomposition Optimal for Bound-Preserving Schemes in Multiple Dimensions?

Arxiv

0+阅读 · 2022年7月18日

On the Complexity of the Bilevel Minimum Spanning Tree Problem

Arxiv

0+阅读 · 2022年7月18日

Fokker-Planck multi-species equations in the adiabatic asymptotics

Arxiv

0+阅读 · 2022年7月18日

A Hybrid High-Order scheme for the stationary, incompressible magnetohydrodynamics equations

Arxiv

0+阅读 · 2022年7月18日

Private Convex Optimization in General Norms

Arxiv

0+阅读 · 2022年7月18日

Optimal Round and Sample-Size Complexity for Partitioning in Parallel Sorting

Arxiv

0+阅读 · 2022年7月17日

Extend the lifetime of wireless sensor networks by modifying cluster-based data collection

Arxiv

0+阅读 · 2022年7月16日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2022年7月15日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

92+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【普林斯顿博士论文】以奖励推动生成式人工智能的发展：奖励引导生成的理论与方法

中文版 | 火力支援与巡飞弹药的未来（附原文）

中文版 | 人工智能时代的任务式指挥

扩散模型中的 Transformer：图像生成及其延展应用询问 ChatGPT

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Arxiv

0+阅读 · 2022年7月19日

Lazy Estimation of Variable Importance for Large Neural Networks

Arxiv

0+阅读 · 2022年7月19日

Is the Classic Convex Decomposition Optimal for Bound-Preserving Schemes in Multiple Dimensions?

Arxiv

0+阅读 · 2022年7月18日

On the Complexity of the Bilevel Minimum Spanning Tree Problem

Arxiv

0+阅读 · 2022年7月18日

Fokker-Planck multi-species equations in the adiabatic asymptotics

Arxiv

0+阅读 · 2022年7月18日

A Hybrid High-Order scheme for the stationary, incompressible magnetohydrodynamics equations

Arxiv

0+阅读 · 2022年7月18日

Private Convex Optimization in General Norms

Arxiv

0+阅读 · 2022年7月18日

Optimal Round and Sample-Size Complexity for Partitioning in Parallel Sorting

Arxiv

0+阅读 · 2022年7月17日

Extend the lifetime of wireless sensor networks by modifying cluster-based data collection

Arxiv

0+阅读 · 2022年7月16日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2022年7月15日

相关基金

TV-miR-200b/c靶向抑制HER2/HER3克服乳腺癌对赫赛汀耐药

国家自然科学基金

0+阅读 · 2014年12月31日

Cell-in-cell介导非易感细胞病毒感染及其免疫逃逸机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

海洋弧菌菌群感应信号分子N-acyl homoserine lactones对NK细胞的调控作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

低频磁场对红曲菌次生代谢的调控效应及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

高k材料MOSFET沟道电子迁移率的增强研究

国家自然科学基金

0+阅读 · 2012年12月31日

多孔介质中的Brinkman-Forchheimer方程解的稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

量子散射中的异常现象、Levinson 定理及其它

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

Ga、Al、In氮化物及其合金和径向异质结纳米线的可控制备和物性研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员