Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the $L_2$ estimation error with respect to the GD iterations, which is away from zero without a delicate scheme of early stopping. In turn, through a comprehensive analysis of $\ell_2$-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the $\ell_2$ regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax {optimal} rate of $L_2$ estimation error can be achieved. Numerical experiments confirm our theory and further demonstrate that the $\ell_2$ regularization approach improves the training robustness and works for a wider range of neural networks.
翻译:由梯度下移(GD)培训的超称神经网络可以明显地超过任何培训数据。 但是, 通用保证可能无法维持噪音数据。 从非参数的角度来看, 本文研究在随机噪音出现时超称神经网络能够恢复真实目标功能的情况。 我们根据GD循环的0.2美元估计误差确定一个较低的约束值, 该误差离零不远,而没有早期停止的微妙计划。 反过来, 通过对 $\ ell_ 2美元正规化的GD轨迹进行全面分析, 我们证明, 以 $\ ell_ 2美元正规化的GD轨迹, 用 $\ ell_ 2美元正规化的超正匹配层 ReLU 神经网络 证明:(1) 输出值接近内核脊梯回归值, 与相应的神经红外核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内核内沉; (2) =2美元估计误差的微摩[optax] =2美元的估计速率可以达到0.2美元。 。 内核实验可以证实我们的理论, 并进一步证明我们理论, 和进一步证明以$\2美元正规化方法改进网络内核内核内核网络的精和工程工程。