Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of large models in machine learning. This non-monotonic behavior of test loss depends on the number of data, the dimensionality of the data and the number of model parameters. Here, we briefly describe double descent, then provide an explanation of why double descent occurs in an informal and approachable manner, requiring only familiarity with linear algebra and introductory probability. We provide visual intuition using polynomial regression, then mathematically analyze double descent with ordinary linear regression and identify three interpretable factors that, when simultaneously all present, together create double descent. We demonstrate that double descent occurs on real data when using ordinary linear regression, then demonstrate that double descent does not occur when any of the three factors are ablated. We use this understanding to shed light on recent observations in nonlinear models concerning superposition and double descent. Code is publicly available.
翻译:双重下降(Double descent)是机器学习中一种令人惊讶的现象。当模型参数相对于数据量增长时,测试误差随着模型变得越来越庞大(即数据欠采样的情况下)而下降。这种测试误差的下降行为违反了经典的过拟合学习理论,可以说是大型模型在机器学习领域取得成功的基础。测试损失的这种非单调行为取决于数据量、数据的维度和模型参数的数量。本文简要介绍了双重下降,然后以通俗易懂的方式对双重下降进行了解释,只需要熟悉线性代数和概率论基础。文章使用多项式回归提供了可视化直观理解,然后使用普通线性回归对双重下降进行了数学分析,并确定了三个可解释的因素,当同时存在这些因素时可以产生双重下降。我们证明了在使用普通线性回归时,双重下降也会发生在实际数据上。然后,我们证明了当任何一个这三个因素被删除时,双重下降不会发生。我们利用这一理解来阐明最近关于超越和双重下降的非线性模型的观察结果。代码可供公开获取。