Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors -- namely loss landscape curvature and distance of parameters from initialization -- respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice.
翻译:现有关于深度网络泛化误差的界通常假设其对输入变量具有某种形式的平滑或有界依赖性,未能深入探究实践中控制此类因素的机制。本研究对经历双下降现象的深度网络的经验Lipschitz常数进行了广泛的实验分析,并揭示了与测试误差强相关的非单调变化趋势。通过建立临界点附近SGD在参数空间与输入空间梯度间的联系,我们分离出两个关键因素——损失景观曲率和参数与初始化的距离——它们分别控制着临界点附近的优化动态,并约束着模型函数的复杂度,甚至超越训练数据范围。本研究为通过过参数化实现的隐式正则化以及实际训练网络的有效模型复杂度提供了新的见解。