Existing generalization bounds fail to explain crucial factors that drive generalization of modern neural networks. Since such bounds often hold uniformly over all parameters, they suffer from over-parametrization, and fail to account for the strong inductive bias of initialization and stochastic gradient descent. As an alternative, we propose a novel optimal transport interpretation of the generalization problem. This allows us to derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the earned prediction function in the data space. Therefore, our bounds are agnostic to the parametrization of the model and work well when the number of training samples is much smaller than the number of parameters. With small modifications, our approach yields accelerated rates for data on low-dimensional manifolds, and guarantees under distribution shifts. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
翻译:现有的一般化界限未能解释导致现代神经网络普遍化的关键因素。 由于这些界限往往在所有参数上都一致存在,它们受到过度平衡的影响,也没有说明初始化和随机梯度下降的强烈诱导偏差。作为替代办法,我们提议对一般化问题进行新的最佳运输解释。这使我们能够得出取决于当地Lipschitz在数据空间中所得预测功能的规律性依据实例的概括化界限。因此,当培训样品数量大大少于参数数量时,我们的界限对于模型的平衡化是不可知的,并且运作良好。经过少量的修改,我们的方法可以加速低维体元数据的速度,保证分布变化。我们从经验上分析了神经网络的一般化界限,表明约束值是有意义的,并捕捉了培训期间普遍规范化方法的效果。