The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional so-called activation function. The most popular choice of such a one-dimensional activation function is the rectified linear unit (ReLU) activation function which maps a real number to its positive part $ \mathbb{R} \ni x \mapsto \max\{ x, 0 \} \in \mathbb{R} $. In this article we propose and analyze a modified variant of the standard training procedure of such ReLU ANNs in the sense that we propose to restrict the negative gradient flow dynamics to a large submanifold of the ANN parameter space, which is a strict $ C^{ \infty } $-submanifold of the entire ANN parameter space that seems to enjoy better regularity properties than the entire ANN parameter space but which is also sufficiently large and sufficiently high dimensional so that it can represent all ANN realization functions that can be represented through the entire ANN parameter space. In the special situation of shallow ANNs with just one-dimensional ANN layers we also prove for every Lipschitz continuous target function that every gradient flow trajectory on this large submanifold of the ANN parameter space is globally bounded. For the standard gradient flow on the entire ANN parameter space with Lipschitz continuous target functions it remains an open problem of research to prove or disprove the global boundedness of gradient flow trajectories even in the situation of shallow ANNs with just one-dimensional ANN layers.
翻译:人工神经网络 (ANNS) 的培训是当今一个高度相关的算法程序,在科学和工业中有许多应用。 粗略地说, ANNS 可以被看成是折线函数和某些固定非线性函数之间的迭代构成, 这些函数通常是单维所谓的激活函数的多维版本。 这种一维激活功能最受欢迎的选择是修正的线性单元( ReLU) 激活功能, 它映射出其正部分的真实数字 $\ mathbb{R}\ ni x\ mappsto\maxx, 0\\ in\ mathb{R} 。 在这个文章中, 我们提议并分析一个修改过的版本, 这样的标准培训程序, 通常是单维的单维的多版本。 将负梯度流的动态限制在 ANNN 参数空间空间空间空间空间空间空间空间空间空间的大型轨道上, 它看起来比整个ANNNP 的正常性功能要好得多, 但是在每一个空基级的轨道上, 直径的直径的运行状态也代表着一个相当大。