In some studies \citep[e.g.,][]{zhang2016understanding} of deep learning, it is observed that over-parametrized deep neural networks achieve a small testing error even when the training error is almost zero. Despite numerous works towards understanding this so-called "double descent" phenomenon \citep[e.g.,][]{belkin2018reconciling,belkin2019two}, in this paper, we turn into another way to enforce zero training error (without over-parametrization) through a data interpolation mechanism. Specifically, we consider a class of interpolated weighting schemes in the nearest neighbors (NN) algorithms. By carefully characterizing the multiplicative constant in the statistical risk, we reveal a U-shaped performance curve for the level of data interpolation in both classification and regression setups. This sharpens the existing result \citep{belkin2018does} that zero training error does not necessarily jeopardize predictive performances and claims a counter-intuitive result that a mild degree of data interpolation actually {\em strictly} improve the prediction performance and statistical stability over those of the (un-interpolated) $k$-NN algorithm. In the end, the universality of our results, such as change of distance measure and corrupted testing data, will also be discussed.
翻译:在深层学习的一些研究中,人们发现,即使培训错误几乎为零,过度平衡的深神经网络也会产生一个小测试错误。尽管在理解所谓的“双向”现象方面做了许多工作[例如,][{belkin2018Reconciling,belkin20192},在本文中,我们通过数据内插机制将零培训错误(不过分平衡)转化为另一种方式。具体地说,我们考虑在近邻(NNN)算法中存在一组内插加权计划。通过仔细描述统计风险中的多复制常数,我们揭示了在分类和回归设置中数据间插度的U型性性能曲线。这加强了现有的结果\citep{belkin2018does},即零培训错误不一定危及预测性能,并声称一个反直觉的结果,即数据间加宽度的普遍性计划(数据间插度的数值)也会改善我们数据间测算结果的准确性能。