Generalization is one of the critical issues in machine learning. However, traditional methods like uniform convergence are not powerful enough to fully explain generalization because they may yield vacuous bounds even in overparameterized linear regression regimes. An alternative solution is to analyze the generalization dynamics to derive algorithm-dependent bounds, e.g., stability. Unfortunately, the stability-based bound is still far from explaining the remarkable generalization ability of neural networks due to the coarse-grained analysis of the signal and noise. Inspired by the observation that neural networks show a slow convergence rate when fitting noise, we propose decomposing the excess risk dynamics and applying stability-based bound only on the variance part (which measures how the model performs on pure noise). We provide two applications for the framework, including a linear case (overparameterized linear regression with gradient descent) and a non-linear case (matrix recovery with gradient flow). Under the decomposition framework, the new bound accords better with the theoretical and empirical evidence compared to the stability-based bound and uniform convergence bound.
翻译:然而,统一趋同等传统方法不够强大,不足以充分解释一般化,因为即使在过度参数化的线性回归制度中,它们也可能产生空洞的界限。另一种解决办法是分析一般化动态,得出依赖算法的界限,例如稳定性。不幸的是,基于稳定性的界限仍然远远不能解释神经网络由于对信号和噪音的粗略分析而出现的显著一般化能力。从神经网络在调适噪音时显示缓慢的趋同率的观察中,我们提议分解过度的风险动态,并且只对差异部分采用基于稳定性的界限(衡量模型如何在纯噪音上运行 ) 。我们为框架提供了两种应用,包括一个线性案例(以梯度下降为超度的线性回归)和一个非线性案例(以梯度流为基准的恢复 ) 。在分层框架下,新的约束性证据与理论和实验性证据比以稳定为基础的约束和统一结合得更好。