Existing generalization bounds fail to explain crucial factors that drive generalization of modern neural networks. Since such bounds often hold uniformly over all parameters, they suffer from over-parametrization, and fail to account for the fact that the set of parameters, considered during initialization and training, is much more restricted than the entire parameter space. As an alternative, we propose a novel optimal transport interpretation of the generalization problem. This allows us to derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function} in the data space. Therefore, our bounds are agnostic to the parametrization of the model and work well when the number of training samples is much smaller than the number of parameters. With small modifications, our approach yields accelerated rates for data on low-dimensional manifolds, and guarantees under distribution shifts. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
翻译:现有的一般化界限无法解释导致现代神经网络普遍化的关键因素。 由于这些界限往往在所有参数上都具有统一性,因此它们遭受过度平衡,也没有考虑到在初始化和培训期间考虑的一组参数比整个参数空间的限制要大得多。作为替代办法,我们提议对一般化问题进行新的最佳运输解释。这使我们能够在数据空间中得出取决于当地Lipschitz对所学预报功能的常规性的根据实例的一般化界限。因此,当培训样品数量大大少于参数数量时,我们的界限对于模型的平衡化是不可知的,效果也很好。经过小的修改,我们的方法产生低维元数据加速率,以及分布变化中的保证。我们从经验上分析了神经网络的一般化界限,表明约束值是有意义的,并捕捉了培训期间大众规范化方法的效果。