解释适应性普遍化差距 (Explaining the Adaptive Generalisation Gap) - 专知论文

会员服务 ·

0

估计/估计量 · 局部曲率 · 曲率 · Neural Networks · 矩阵论 ·

2021 年 2 月 26 日

Explaining the Adaptive Generalisation Gap

翻译：解释适应性普遍化差距

Diego Granziol,Xingchen Wan,Samuel Albanie,Stephen Roberts

We conjecture that the reason for the difference in generalisation between adaptive and non-adaptive gradient methods stems from the failure of adaptive methods to account for the greater levels of noise associated with flatter directions in their estimates of local curvature. This conjecture motivated by results in random matrix theory has implications for optimisation in both simple convex settings and deep neural networks. We demonstrate that typical schedules used for adaptive methods (with low numerical stability or damping constants) serve to bias relative movement towards flat directions relative to sharp directions, effectively amplifying the noise-to-signal ratio and harming generalisation. We show that the numerical stability/damping constant used in these methods can be decomposed into a learning rate reduction and linear shrinkage of the estimated curvature matrix. We then demonstrate significant generalisation improvements by increasing the shrinkage coefficient, closing the generalisation gap entirely in our deep neural network experiments. Finally, we show that other popular modifications to adaptive methods, such as decoupled weight decay and partial adaptivity can be shown to calibrate parameter updates to make better use of sharper, more reliable directions.

翻译：我们推测适应性梯度方法与非适应性梯度方法之间一般化差异的原因是适应方法未能考虑到与当地曲线估计方向相伴的较高噪音水平。随机矩阵理论的结果引起的这种推测对简单的二次曲线设置和深神经网络的优化都有影响。我们表明,适应方法(数字稳定性低或阻力常数低)所使用的典型时间表偏向与尖锐方向相对的平坦方向,有效扩大噪声对信号比率和损害一般化。我们表明,这些方法中使用的数字稳定性/振动常数可以分解成学习率的降低和估计曲线矩阵的线性缩缩缩。我们然后通过增加缩微系数,完全缩小我们深神经网络实验中的通化差距,显示出显著的总体性改进。最后,我们表明,对适应方法的其他流行性修改,例如分解重力重力衰变弱和部分调整性,可以调整参数更新,以便更好使用更精确、更可靠的方向。

0

相关内容

估计/估计量

估计/估计量

【NYU-WESLEY MADDOX】贝叶斯神经网络教程，83页ppt

【NYU-WESLEY MADDOX】贝叶斯神经网络教程，83页ppt

专知会员服务

60+阅读 · 2021年4月15日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年4月17日

Evolutionary Variational Optimization of Generative Models

Arxiv

0+阅读 · 2021年4月16日

The direct force correction based framework for general co-rotational analysis

Arxiv

0+阅读 · 2021年4月16日

Community Detection in Partially Observable Social Networks

Arxiv

0+阅读 · 2021年4月16日

Robust Generalised Bayesian Inference for Intractable Likelihoods

Arxiv

0+阅读 · 2021年4月15日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

A General and Adaptive Robust Loss Function

A General and Adaptive Robust Loss Function

Arxiv

8+阅读 · 2018年11月5日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Large Margin Few-Shot Learning

Arxiv

11+阅读 · 2018年7月8日

VIP会员

文章信息

相关主题

估计/估计量

Neural Networks

相关VIP内容

【NYU-WESLEY MADDOX】贝叶斯神经网络教程，83页ppt

【NYU-WESLEY MADDOX】贝叶斯神经网络教程，83页ppt

专知会员服务

60+阅读 · 2021年4月15日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

热门VIP内容

开通专知VIP会员享更多权益服务

《全谱战争——从拓宽工具到思考不可思考之事》

《FPV武装无人机的战斗飞行艺术与科学》最新报告

无人机作战：演进、创新与未来战场

《反无人机：用于无人机探测与定位的多输入多输出雷达》最新69页

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年4月17日

Evolutionary Variational Optimization of Generative Models

Arxiv

0+阅读 · 2021年4月16日

The direct force correction based framework for general co-rotational analysis

Arxiv

0+阅读 · 2021年4月16日

Community Detection in Partially Observable Social Networks

Arxiv

0+阅读 · 2021年4月16日

Robust Generalised Bayesian Inference for Intractable Likelihoods

Arxiv

0+阅读 · 2021年4月15日

Recent Advances in Large Margin Learning

Arxiv

12+阅读 · 2021年3月25日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

A General and Adaptive Robust Loss Function

A General and Adaptive Robust Loss Function

Arxiv

8+阅读 · 2018年11月5日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Large Margin Few-Shot Learning

Arxiv

11+阅读 · 2018年7月8日

微信扫码咨询专知VIP会员