In this paper, we study the implicit regularization of the gradient descent algorithm in homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations. In particular, we study the gradient descent or gradient flow (i.e., gradient descent with infinitesimal step size) optimizing the logistic loss or cross-entropy loss of any homogeneous model (possibly non-smooth), and show that if the training loss decreases below a certain threshold, then we can define a smoothed version of the normalized margin which increases over time. We also formulate a natural constrained optimization problem related to margin maximization, and prove that both the normalized margin and its smoothed version converge to the objective value at a KKT point of the optimization problem. Our results generalize the previous results for logistic regression with one-layer or multi-layer linear networks, and provide more quantitative convergence results with weaker assumptions than previous results for homogeneous smooth neural networks. We conduct several experiments to justify our theoretical finding on MNIST and CIFAR-10 datasets. Finally, as margin is closely related to robustness, we discuss potential benefits of training longer for improving the robustness of the model.
翻译:在本文中,我们研究了同质神经网络中梯度下降算法的隐含正规化,包括完全连接和进化神经网络,以及RELU或LeakyRELU的激活,我们研究了同质神经网络中梯度下降算法的隐含正规化;我们特别研究了梯度下降或梯度流动(即梯度下降,以极小的步数为单位),优化了任何同质模型(可能非单线性)的后勤损失或交叉干燥性损失(可能非单线性),并表明,如果培训损失低于某一阈值,我们就可以确定一个平稳的、随着时间的推移而增加的正常差值。我们还提出了与差值最大化有关的自然限制优化问题,并证明正常差值及其平滑度版本在优化问题的KKT点与客观价值一致。我们的成果概括了以往以单层或多层线性网络为主的后勤倒退结果,并提供了比以前单一光滑线网络的结果更弱的量化的趋同结果。我们进行了几次实验,以证明我们对MNIST和CIFAR-10数据集的理论发现的合理性结果是合理的。最后,我们讨论了更牢固的模型的利度。我们如何改进的可能性。