In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with $d \in \mathbb{N}$ neurons on the input layer, $H \in \mathbb{N}$ neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.
翻译:在本篇文章中,我们研究了在使用RELU激活完全连接的进料前向人造神经网络的培训中使用的随机梯度下降优化方法。 这项工作的主要结果证明,如果考虑的目标功能不变, SGD 过程的风险会合为零。 在既定的趋同结果中,考虑的人工神经网络包括一个输入层、一个隐藏层和一个输出层(输入层的神经元为$d $ $ $\ in\mathbb{N}$,隐藏层的H $ $ \ in \ mathbb{N}$ 神经元和输出层的神经元), 。 SGD 过程的学习率假定是足够小的, 用于培训人工神经网络的 SGD 过程中所使用的输入数据假定是独立和同样分布的。