In this paper, we introduce a novel optimization algorithm for machine learning model training called Normalized Stochastic Gradient Descent (NSGD) inspired by Normalized Least Mean Squares (NLMS) from adaptive filtering. When we train a high-complexity model on a large dataset, the learning rate is significantly important as a poor choice of optimizer parameters can lead to divergence. The algorithm updates the new set of network weights using the stochastic gradient but with $\ell_1$ and $\ell_2$-based normalizations on the learning rate parameter similar to the NLMS algorithm. Our main difference from the existing normalization methods is that we do not include the error term in the normalization process. We normalize the update term using the input vector to the neuron. Our experiments present that the model can be trained to a better accuracy level on different initial settings using our optimization algorithm. In this paper, we demonstrate the efficiency of our training algorithm using ResNet-20 and a toy neural network on different benchmark datasets with different initializations. The NSGD improves the accuracy of the ResNet-20 from 91.96\% to 92.20\% on the CIFAR-10 dataset.
翻译:在本文中, 我们为机器学习模式培训引入了一种新型优化算法, 名为“ 普通沙粒梯( NSGD ) ”, 由适应过滤法的普通最低平均值平方( NLMS ) 所启发。 当我们在大型数据集上培训高复杂度模型时, 学习率非常重要, 因为优化参数的选择不力, 可能导致差异。 该算法使用随机梯度梯度来更新新一组网络重量, 但是在类似 NLMS 算法的学习率参数上以$\ell_ 1美元和$\ell_ 2美元为基调。 我们与现有正常化方法的主要区别是, 我们在正常化过程中不包含错误术语。 我们使用神经元输入矢量来对更新术语进行常规化。 我们的实验显示, 该模型可以使用优化算法在不同的初始设置中接受更精确度的培训。 在本文中, 我们用ResNet-20 和 以不同初始化的基准数据集的微调神经网络来展示我们的培训算法的效率。 NSGDGD 提高了ResNet-20 的精确度, 从9196_ 至 10- 。