Regularization plays an important role in generalization of deep neural networks, which are often prone to overfitting with their numerous parameters. L1 and L2 regularizers are common regularization tools in machine learning with their simplicity and effectiveness. However, we observe that imposing strong L1 or L2 regularization with stochastic gradient descent on deep neural networks easily fails, which limits the generalization ability of the underlying neural networks. To understand this phenomenon, we first investigate how and why learning fails when strong regularization is imposed on deep neural networks. We then propose a novel method, gradient-coherent strong regularization, which imposes regularization only when the gradients are kept coherent in the presence of strong regularization. Experiments are performed with multiple deep architectures on three benchmark data sets for image recognition. Experimental results show that our proposed approach indeed endures strong regularization and significantly improves both accuracy and compression (up to 9.9x), which could not be achieved otherwise.
翻译:精密神经网络的常规化在普及深层神经网络方面起着重要作用,这些网络往往容易与众多参数过于匹配。L1和L2正规化者是机械学习中常见的正规化工具,其简单性和有效性很高。然而,我们发现,在深层神经网络上强制实行强力的L1或L2在深层神经梯度下下降的固定化很容易失败,这限制了内在神经网络的常规化能力。为了理解这一现象,我们首先调查在对深层神经网络强制实行强力规范化时,学习如何和为什么失败。我们然后提出一种新的方法,即梯度和高度规范化,只有在梯度保持一致性且高度正规化的情况下,才能强制规范化。实验是在三个基准数据集上以多种深度结构进行,以图象识别。实验结果表明,我们所提议的方法确实能够保持强大的正规化,并大大改进精准性和压缩(高达9.9x),否则无法实现。