The classical bias-variance trade-off predicts that bias decreases and variance increase with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and confirm that variance unimodality occurs robustly for all models we considered. The risk curve is the sum of the bias and variance curves and displays different qualitative shapes depending on the relative scale of bias and variance, with the double descent curve observed in recent literature as a special case. We corroborate these empirical results with a theoretical analysis of two-layer linear networks with random first layer. Finally, evaluation on out-of-distribution data shows that most of the drop in accuracy comes from increased bias while variance increases by a relatively small amount. Moreover, we find that deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.
翻译:典型的偏差权衡法预测,偏差会随着模型复杂度而减少和差异增加,从而导致U形风险曲线。最近的工作使得神经网络和其他超度参数模型对此产生疑问,人们经常看到,较大的模型会更概括化。我们通过测量神经网络的偏差和差异来简单解释这一点:虽然偏差与古典理论一样单向缩小,但差异是单向的或钟形的:随着网络宽度的扩大而增加。我们改变网络结构、损失功能和数据集的选择,并证实我们所考虑的所有模型都出现强烈的偏差和差异。风险曲线是偏差和差异曲线的总和,根据偏差和差异的相对规模显示不同的质量形状,最近文献中观察到的双向下降曲线是一个特殊案例。我们用对一级随机的两层线性网络的理论分析来证实这些经验性结果。最后,对分配外数据的评估表明,大多数准确性下降是由于偏差的增加,而差异则因相对小的偏差程度而增加。此外,我们发现,更深的模型会减少,数据会缩小。