证明在培训人工神经网络方面,随机梯度梯度梯度下降与RELU启动用于经常目标功能的人工神经网络的趋同 (A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions)

人工神经网络 · 层 · 随机梯度下降 · Neural Networks · SGD ·

2021 年 4 月 1 日

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

翻译：证明在培训人工神经网络方面,随机梯度梯度梯度下降与RELU启动用于经常目标功能的人工神经网络的趋同

Arnulf Jentzen,Adrian Riekert

from arxiv, 29 pages

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with $d \in \mathbb{N}$ neurons on the input layer, $H \in \mathbb{N}$ neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.

翻译：在本篇文章中,我们研究了在使用RELU激活完全连接的进料前向人造神经网络的培训中使用的随机梯度下降优化方法。这项工作的主要结果证明,如果考虑的目标功能不变, SGD 过程的风险会合为零。在既定的趋同结果中,考虑的人工神经网络包括一个输入层、一个隐藏层和一个输出层(输入层的神经元为$d $ $ $\ in\mathbb{N}$,隐藏层的H $ $ \ in \ mathbb{N}$ 神经元和输出层的神经元), 。 SGD 过程的学习率假定是足够小的, 用于培训人工神经网络的 SGD 过程中所使用的输入数据假定是独立和同样分布的。

相关内容

人工神经网络

关注 131

人工神经网络（Artificial Neural Network，即ANN），它从信息处理角度对人脑神经元网络进行抽象，建立某种简单模型，按不同的连接方式组成不同的网络。在工程与学术界也常直接简称为神经网络或类神经网络。神经网络是一种运算模型，由大量的节点（或称神经元）之间相互联接构成。每个节点代表一种特定的输出函数，称为激励函数（activation function）。每两个节点间的连接都代表一个对于通过该连接信号的加权值，称之为权重，这相当于人工神经网络的记忆。网络的输出则依网络的连接方式，权重值和激励函数的不同而不同。而网络自身通常都是对自然界某种算法或者函数的逼近，也可能是对一种逻辑策略的表达。

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日