证明在培训人造神经网络以发挥固定目标功能方面梯度下降的趋同性 (A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions) - 专知论文

会员服务 ·

0

泛函 · 梯度下降法 · 人工神经网络 · Neural Networks · 优化器 ·

2021 年 2 月 19 日

A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

翻译：证明在培训人造神经网络以发挥固定目标功能方面梯度下降的趋同性

Patrick Cheridito,Arnulf Jentzen,Adrian Riekert,Florian Rossmannek

from arxiv, 23 pages

Gradient descent optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). Even though a huge number of numerical simulations indicate that gradient descent optimization methods do indeed convergence in the training of ANNs, until today there is no rigorous theoretical analysis which proves (or disproves) this conjecture. In particular, even in the case of the most basic variant of gradient descent optimization algorithms, the plain vanilla gradient descent method, it remains an open problem to prove or disprove the conjecture that gradient descent converges in the training of ANNs. In this article we solve this problem in the special situation where the target function under consideration is a constant function. More specifically, in the case of constant target functions we prove in the training of rectified fully-connected feedforward ANNs with one-hidden layer that the risk function of the gradient descent method does indeed converge to zero. Our mathematical analysis strongly exploits the property that the rectifier function is the activation function used in the considered ANNs. A key contribution of this work is to explicitly specify a Lyapunov function for the gradient flow system of the ANN parameters. This Lyapunov function is the central tool in our convergence proof of the gradient descent method.

翻译：渐渐下降优化算法是用于培训人工神经网络的标准要素。尽管大量数字模拟表明,梯度下降优化方法确实在培训非本国人员的过程中的确趋于一致,但直到今天,还没有严格的理论分析证明这种推测。特别是,即使梯度下降优化算法的最基本变数,即普通香草梯度下降法,证明或否定在培训非本国人员时梯度下降集合的假设,这仍然是一个未解决的问题。在本条中,我们解决了这一问题,因为考虑的目标函数是一个不变的功能。更具体地说,在不断的目标函数中,我们证明,在以一重的层对完全相连的种子向非本国人员进行的培训中,梯度下降方法的风险功能确实趋同于零。我们的数学分析有力地利用了一种属性,即调校正功能是考虑的ANNS中所使用的激活功能。这项工作的一项关键贡献是,明确指定了我们Lyapunov 渐渐变工具的Lyapunov Clegregal 系统。

0

相关内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【论文】深度学习的最优化:理论和算法（Optimization for deep learning: theory and algorithms）

【论文】深度学习的最优化:理论和算法（Optimization for deep learning: theory and algorithms）

专知会员服务

148+阅读 · 2019年12月28日

《机器学习与公平性》（Fairness and Machine Learning）新书发布，附181页PDF下载

《机器学习与公平性》（Fairness and Machine Learning）新书发布，附181页PDF下载

专知会员服务

78+阅读 · 2019年10月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

tf.GradientTape 详解

tf.GradientTape 详解

TensorFlow

120+阅读 · 2020年2月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

手把手教你估算深度神经网络的最优学习率（附代码&教程）

手把手教你估算深度神经网络的最优学习率（附代码&教程）

数据分析

6+阅读 · 2017年11月30日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Sequential convergence of AdaGrad algorithm for smooth convex optimization

Arxiv

0+阅读 · 2021年4月13日

Stein variational gradient descent with local approximations

Arxiv

1+阅读 · 2021年4月13日

First-order and second-order variants of the gradient descent in a unified framework

Arxiv

0+阅读 · 2021年4月13日

A Recipe for Global Convergence Guarantee in Deep Neural Networks

Arxiv

0+阅读 · 2021年4月12日

LQR with Tracking: A Zeroth-order Approach and Its Global Convergence

Arxiv

0+阅读 · 2021年4月12日

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

Arxiv

0+阅读 · 2021年4月12日

Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent

Arxiv

0+阅读 · 2021年4月12日

Linearly Constrained Neural Networks

Arxiv

0+阅读 · 2021年4月12日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

VIP会员

文章信息

相关主题

梯度下降法

人工神经网络

Neural Networks

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【论文】深度学习的最优化:理论和算法（Optimization for deep learning: theory and algorithms）

【论文】深度学习的最优化:理论和算法（Optimization for deep learning: theory and algorithms）

专知会员服务

148+阅读 · 2019年12月28日

《机器学习与公平性》（Fairness and Machine Learning）新书发布，附181页PDF下载

《机器学习与公平性》（Fairness and Machine Learning）新书发布，附181页PDF下载

专知会员服务

78+阅读 · 2019年10月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

tf.GradientTape 详解

tf.GradientTape 详解

TensorFlow

120+阅读 · 2020年2月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

手把手教你估算深度神经网络的最优学习率（附代码&教程）

手把手教你估算深度神经网络的最优学习率（附代码&教程）

数据分析

6+阅读 · 2017年11月30日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Sequential convergence of AdaGrad algorithm for smooth convex optimization

Arxiv

0+阅读 · 2021年4月13日

Stein variational gradient descent with local approximations

Arxiv

1+阅读 · 2021年4月13日

First-order and second-order variants of the gradient descent in a unified framework

Arxiv

0+阅读 · 2021年4月13日

A Recipe for Global Convergence Guarantee in Deep Neural Networks

Arxiv

0+阅读 · 2021年4月12日

LQR with Tracking: A Zeroth-order Approach and Its Global Convergence

Arxiv

0+阅读 · 2021年4月12日

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

Arxiv

0+阅读 · 2021年4月12日

Homeomorphic-Invariance of EM: Non-Asymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent

Arxiv

0+阅读 · 2021年4月12日

Linearly Constrained Neural Networks

Arxiv

0+阅读 · 2021年4月12日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

微信扫码咨询专知VIP会员