学习渐渐后代和弱弱 Convex 损失 (Learning with Gradient Descent and Weakly Convex Losses) - 专知论文

会员服务 ·

0

经验风险 · 测试误差 · Networking · 缩放 · Neural Networks ·

2021 年 1 月 13 日

Learning with Gradient Descent and Weakly Convex Losses

翻译：学习渐渐后代和弱弱 Convex 损失

Dominic Richards,Mike Rabbat

We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of the empirical risk's Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample guarantees are then achieved by decomposing the test error into generalisation, optimisation and approximation errors, each of which can be bounded and traded off with respect to algorithmic parameters, sample size and magnitude of this eigenvalue. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity, specifically, the Hessian's smallest eigenvalue during training can be controlled by the normalisation of the layers, i.e., network scaling. This allows test error guarantees to then be achieved when the population risk minimiser satisfies a complexity assumption. By trading off the network complexity and scaling, insights are gained into the implicit bias of neural network scaling, which are further supported by experimental findings.

翻译：当实验性风险微弱时,我们研究梯度下降的学习表现,即实验性风险中最小负值的半数值,其范围是范围。通过显示这种半数值能够控制梯度下降的稳定性,一般误差的界限被证明与以前的工作相比,具有更广泛的步骤大小。然后,通过将试验错误分解成一般化、优化和近似差错,从而实现抽样保证,其中每种差错都可以在算法参数、样本大小和范围上被约束和交易。在两个层神经网络中,我们证明经验性风险可以满足当地微弱凝固度的概念,具体地说,海森在培训期间最小的精度值可以通过层的正常化来控制,即网络缩放。这样,在人口风险最小化的假设达到一个复杂度时,就可以实现试验性错误保证。通过交换网络的复杂度和扩展,洞察到神经性网络缩放的隐含偏差,这得到实验结论的进一步支持。

0

相关内容

经验风险

经验风险是对训练集中的所有样本点损失函数的平均最小化。经验风险越小说明模型f(X)对训练集的拟合程度越好。

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【论文|迁移自适应学习综述】Transfer Adaptation Learning: A Decade Survey

【论文|迁移自适应学习综述】Transfer Adaptation Learning: A Decade Survey

专知会员服务

45+阅读 · 2019年11月26日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Constrained Learning with Non-Convex Losses

Arxiv

0+阅读 · 2021年3月8日

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

Arxiv

0+阅读 · 2021年3月5日

Discrepancy-Based Active Learning for Domain Adaptation

Arxiv

0+阅读 · 2021年3月5日

Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change

Arxiv

0+阅读 · 2021年3月5日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【论文|迁移自适应学习综述】Transfer Adaptation Learning: A Decade Survey

【论文|迁移自适应学习综述】Transfer Adaptation Learning: A Decade Survey

专知会员服务

45+阅读 · 2019年11月26日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】语义提示扩散变换器的像素级精确深度估计

俄乌冲突的地缘政治与军事教训（万字长文）

【博士论文】弥合多模态基础模型与世界模型之间的鸿沟

量子增强计算机视觉：超越经典算法

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Constrained Learning with Non-Convex Losses

Arxiv

0+阅读 · 2021年3月8日

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

Arxiv

0+阅读 · 2021年3月5日

Discrepancy-Based Active Learning for Domain Adaptation

Arxiv

0+阅读 · 2021年3月5日

Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change

Arxiv

0+阅读 · 2021年3月5日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员