Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. However, previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions. In this paper, to overcome these drawbacks, we propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks (including long short-term memory networks), by using the error backpropagation and our average-approximation RLS method, together with the equivalent gradients of the linear least squares loss function with respect to the linear outputs of hidden layers. Compared with previous RLS optimization algorithms, our algorithms are simple and elegant. They can be viewed as an improved stochastic gradient descent (SGD) algorithm, which uses the inverse autocorrelation matrix of each layer as the adaptive learning rate. Their time and space complexities are only several times those of SGD. They only require the loss function to be the mean squared error and the activation function of the output layer to be invertible. In fact, our algorithms can be also used in combination with other first-order optimization algorithms without requiring these two preconditions. In addition, we present two improved methods for our algorithms. Finally, we demonstrate their effectiveness compared to the Adam algorithm on MNIST, CIFAR-10 and IMDB datasets, and investigate the influences of their hyperparameters experimentally.
翻译:用于培训小规模神经网络(RLS)的变异最小平方(RLS)算法曾经被广泛用于培训小规模神经网络(包括长期内存网络),因为它们快速趋同。然而,以前的RLS算法不适合用于培训深神经网络(DNNS),因为它们具有很高的计算复杂性和太多的先决条件。在本文中,为了克服这些缺点,我们提议了三种新的RLS优化算法,用于培训进取神经网络、进化神经网络和经常性神经网络(包括长期内存网络),方法是使用错误反向反向神经网络(包括长期内存网络),同时使用平均对流法RLS方法,加上线性最小平方损失函数的等梯度,与隐藏层的线性输出值相当。与以前的RLS优化算法相比,我们的算法是简单而优雅的。我们可以把每一层的反向自动通缩矩阵矩阵用作适应性学习速度。它们的时间和空间复杂度只有SGD的数倍。它们只需要损失函数只是与线性最小的最差梯度梯度梯度梯度的梯度梯度梯度的梯度梯度梯度,在我们的亚化中, 也中可以显示中,而我们使用其他的极值。