没有神经内核的浅层神经网络的渐深层稳定和普遍化 (Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel)

We revisit on-average algorithmic stability of GD for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the NTK or PL assumptions. In particular, we show oracle type bounds which reveal that the generalisation and excess risk of GD is controlled by an interpolating network with the shortest GD path from initialisation (in a sense, an interpolating network with the smallest relative norm). While this was known for kernelised interpolants, our proof applies directly to networks trained by GD without intermediate kernelisation. At the same time, by relaxing oracle inequalities developed here we recover existing NTK-based risk bounds in a straightforward way, which demonstrates that our analysis is tighter. Finally, unlike most of the NTK-based analyses we focus on regression with label noise and show that GD with early stopping is consistent.

翻译：我们重新审视GD的平均算法稳定性,以训练超分的浅神经网络,并证明新的一般化和超重风险界限,而没有NTK或PL假设。特别是,我们展示了甲骨文型界限,表明GD的一般化和超重风险由一个内插网络控制,从初始化开始,GD路径最短(从某种意义上说,是一个具有最小相对规范的内分解网络 ) 。虽然这是内分泌的内向内插者所知道的,但我们的证据直接适用于GD训练的网络,而没有中间内分泌。与此同时,我们通过放松或消除在这里形成的不平等,我们以直截了当的方式恢复了基于NTK的现有风险界限,这表明我们的分析更加紧密。最后,与基于NTK的多数分析不同,我们侧重于标签噪音的回归,并表明早期停止GD是一致的。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

图机器学习-图拉普拉斯算子的离散正则性，141页ppt，Discrete regularity graph Laplacians

专知会员服务

29+阅读 · 2020年6月4日