This paper presents a novel technique based on gradient boosting to train a shallow neural network (NN). Gradient boosting is an additive expansion algorithm in which a series of models are trained sequentially to approximate a given function. A neural network can also be seen as an additive model where the scalar product of the responses of the last hidden layer and its weights provide the final output of the network. Instead of training the network as a whole, the proposed algorithm trains the network sequentially in $T$ steps. First, the bias term of the network is initialized with a constant approximation that minimizes the average loss of the data. Then, at each step, a portion of the network, composed of $J$ neurons, is trained to approximate the pseudo-residuals on the training data computed from the previous iterations. Finally, the $T$ partial models and bias are integrated as a single NN with $T \times J$ neurons in the hidden layer. Extensive experiments in classification and regression tasks are carried out showing a competitive generalization performance with respect to neural networks trained with different standard solvers, such as Adam, L-BFGS and SGD. Furthermore, we show that the proposed method design permits to switch off a number of hidden units during test (the units that were last trained) without a significant reduction of its generalization ability. This permits the adaptation of the model to different classification speed requirements on the fly.
翻译:本文介绍了一种基于梯度增压以训练浅神经网络(NN)的新技术。 梯度增压是一种累进扩张算法, 其中一系列模型被按顺序训练, 以近似给定函数。 神经网络也可以被视为一种添加模型, 其中最后隐藏层反应及其加权数的缩略图产品提供了网络的最终输出。 拟议的算法不是对整个网络进行培训, 而是将网络依次地用$T 来训练。 首先, 网络的偏差期初始化为不断接近, 以最大限度地减少数据的平均损失。 然后, 每一步, 网络中的一部分由美元神经元组成, 被训练为根据从先前的迭代计算的培训数据近似假的缩影。 最后, $T 部分模型和偏差作为单一的NNN, 以$T\ time J$ 在隐藏层的神经元为单位。 在分类和回归任务中进行广泛的实验, 显示与由不同标准解算器所训练的神经网络的竞争性总体化表现, 由$ JBF 和S 在最后的测试模型设计中, 显示一个显著的缩缩缩到最后的缩略图。