Deep neural networks are often prone to over-fitting with their numerous parameters, so regularization plays an important role in generalization. L1 and L2 regularizers are common regularization tools in machine learning with their simplicity and effectiveness. However, we observe that imposing strong L1 or L2 regularization on deep neural networks with stochastic gradient descent easily fails, which limits the generalization ability of the underlying neural networks. To understand this phenomenon, we first investigate how and why learning fails when strong regularization is imposed on deep neural networks. We then propose a novel method, gradient-coherent strong regularization, which imposes regularization only when the gradients are kept coherent in the presence of strong regularization. Experiments are performed with multiple deep architectures on three benchmark data sets for image recognition. Experimental results show that our proposed approach indeed endures strong regularization and significantly improves both accuracy and compression, which could not be achieved otherwise.
翻译:深神经网络往往会过度适应其众多参数,因此,正规化在一般化中起着重要作用。L1和L2正规化者是机械学习中常见的正规化工具,具有简单性和有效性。然而,我们发现,将强力的L1或L2正规化强加给具有随机梯度梯度下降的深神经网络很容易失败,这限制了内在神经网络的普遍化能力。为了理解这一现象,我们首先调查在对深神经网络强制实行强力正规化时,学习如何和为什么失败。然后我们提出一种新的方法,即梯度对齐的严格正规化,只有在高度正规化的情况下梯度保持一致性时,才能强制实行正规化。实验是在三套基准数据集的多个深层结构下进行的,以图象识别。实验结果表明,我们所提议的方法确实持续了强大的正规化,并大大改进了精度和压缩,否则无法做到这一点。