We introduce a novel mathematical formulation for the training of feed-forward neural networks with (potentially non-smooth) proximal maps as activation functions. This formulation is based on Bregman distances and a key advantage is that its partial derivatives with respect to the network's parameters do not require the computation of derivatives of the network's activation functions. Instead of estimating the parameters with a combination of first-order optimisation method and back-propagation (as is the state-of-the-art), we propose the use of non-smooth first-order optimisation methods that exploit the specific structure of the novel formulation. We present several numerical results that demonstrate that these training approaches can be equally well or even better suited for the training of neural network-based classifiers and (denoising) autoencoders with sparse coding compared to more conventional training frameworks.
翻译:我们采用了一种新的数学配方,用于培训进料向神经网络,以(可能非悬浮)准地图作为激活功能。这种配方基于Bregman距离,一个关键优势是其网络参数方面的部分衍生物不需要计算网络激活功能的衍生物。我们建议使用非吸附第一线优化方法,利用新配方的具体结构。我们提出了一些数字结果,表明这些培训方法可以同样或甚至更适合以神经网络为基础的分类师和与较传统的培训框架相比,以稀疏的编码(稀疏)自动编码师的培训。