Generalization error bounds for deep neural networks trained by stochastic gradient descent (SGD) are derived by combining a dynamical control of an appropriate parameter norm and the Rademacher complexity estimate based on parameter norms. The bounds explicitly depend on the loss along the training trajectory, and work for a wide range of network architectures including multilayer perceptron (MLP) and convolutional neural networks (CNN). Compared with other algorithm-depending generalization estimates such as uniform stability-based bounds, our bounds do not require $L$-smoothness of the nonconvex loss function, and apply directly to SGD instead of Stochastic Langevin gradient descent (SGLD). Numerical results show that our bounds are non-vacuous and robust with the change of optimizer and network hyperparameters.
翻译:通过对适当参数规范进行动态控制,并结合根据参数规范对Rademacher复杂度所作的估计,得出了通过随机梯度梯度下坡法(SGD)训练的深神经网络的一般误差界限。这些误差明确取决于培训轨迹的损失,以及包括多层感官(MLP)和进化神经网络(CNN)在内的广泛网络结构的工程。 与其他算法偏差的一般估计(如统一稳定线)相比,我们的界限并不要求非convex损失功能的L$吸附性,而是直接适用于SGD,而不是Stochatic Langevin梯度系(SGLD)。 数值结果显示,随着优化和网络超参数的变化,我们的界限是非挥发性的和坚固的。