Unlike the conventional wisdom in statistical learning theory, the test error of a deep neural network (DNN) often demonstrates double descent: as the model complexity increases, it first follows a classical U-shaped curve and then shows a second descent. Through bias-variance decomposition, recent studies revealed that the bell-shaped variance is the major cause of model-wise double descent (when the DNN is widened gradually). This paper investigates epoch-wise double descent, i.e., the test error of a DNN also shows double descent as the number of training epoches increases. By extending the bias-variance analysis to epoch-wise double descent of the zero-one loss, we surprisingly find that the variance itself, without the bias, varies consistently with the test error. Inspired by this result, we propose a novel metric, optimization variance (OV), to measure the diversity of model updates caused by the stochastic gradients of random training batches drawn in the same iteration. OV can be estimated using samples from the training set only but correlates well with the (unknown) \emph{test} error, and hence early stopping may be achieved without using a validation set.
翻译:与统计学习理论的传统智慧不同,深神经网络(DNN)的测试错误往往显示双向下降:随着模型复杂性的增加,它首先遵循经典的U形曲线,然后显示第二个下降。通过偏差分解,最近的研究显示,钟形差异是模型性双向下降的主要原因(当DNN逐渐扩大时)。本文调查了不同时代的双向下降,即DNN的测试错误也显示双向下降,因为培训分数的增加。通过将偏差分析扩展至零一损失的偏差偏差偏差偏差偏差的双向,然后显示第二个下降。我们惊讶地发现,没有偏差的差异本身随测试错误而变化。受这个结果的启发,我们提出了一个新的衡量度、优化差异(OV),以测量在同一迭代中随机培训批量量的随机梯度梯度导致的模型更新多样性。OV只能使用训练组的样本来估计,但与(未知的)\emph{stest}的误差可能早期停止。