论深度网络的Lipschitz常数与双下降现象 (On the Lipschitz Constant of Deep Networks and Double Descent)

Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors -- namely loss landscape curvature and distance of parameters from initialization -- respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice.

翻译：现有关于深度网络泛化误差的界通常假设其对输入变量具有某种形式的平滑或有界依赖性，未能深入探究实践中控制此类因素的机制。本研究对经历双下降现象的深度网络的经验Lipschitz常数进行了广泛的实验分析，并揭示了与测试误差强相关的非单调变化趋势。通过建立临界点附近SGD在参数空间与输入空间梯度间的联系，我们分离出两个关键因素——损失景观曲率和参数与初始化的距离——它们分别控制着临界点附近的优化动态，并约束着模型函数的复杂度，甚至超越训练数据范围。本研究为通过过参数化实现的隐式正则化以及实际训练网络的有效模型复杂度提供了新的见解。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日