没有神经内核的浅层神经网络的渐深层稳定和普遍化 (Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel)

We revisit on-average algorithmic stability of Gradient Descent (GD) for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the Neural Tangent Kernel (NTK) or Polyak-{\L}ojasiewicz (PL) assumptions. In particular, we show oracle type bounds which reveal that the generalisation and excess risk of GD is controlled by an interpolating network with the shortest GD path from initialisation (in a sense, an interpolating network with the smallest relative norm). While this was known for kernelised interpolants, our proof applies directly to networks trained by GD without intermediate kernelisation. At the same time, by relaxing oracle inequalities developed here we recover existing NTK-based risk bounds in a straightforward way, which demonstrates that our analysis is tighter. Finally, unlike most of the NTK-based analyses we focus on regression with label noise and show that GD with early stopping is consistent.

翻译：我们重新审视了梯子平均算法稳定性(GD),以培训超分的浅神经网络,并证明新的一般化和超重风险界限,没有Neal Tangent Kernel(NTK)或Polliak-L}jasiewicz(PL)的假设。特别是,我们展示了甲骨文型界限,表明GD的概括性和超重风险由一个内插网络控制,而从初始化到最短的GD路径(从某种意义上说,是一个与最小的相对规范相交织的网络 ) 。虽然这是内分泌的内分泌的内分泌者所知道的,但我们的证据直接适用于GD训练的网络,而没有中间内分泌。与此同时,我们通过在此开发的放松或缩小不平等,我们以直截了现有的NTK的风险界限,这表明我们的分析更为密切。最后,我们与大多数基于NTK的分析侧重于带标签噪音的回归,并显示GD与早期停止是相一致的。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

具有组合核的图神经网络，Graph Neural Networks with Composite Kernels

专知会员服务

59+阅读 · 2020年5月20日