Double descent is a phenomenon of over-parameterized statistical models such as deep neural networks which have a re-descending property in their risk function. As the complexity of the model increases, risk exhibits a U-shaped region due to the traditional bias-variance trade-off, then as the number of parameters equals the number of observations and the model becomes one of interpolation where the risk can be unbounded and finally, in the over-parameterized region, it re-descends -- the double descent effect. Our goal is to show that this has a natural Bayesian interpretation. We also show that this is not in conflict with the traditional Occam's razor -- simpler models are preferred to complex ones, all else being equal. Our theoretical foundations use Bayesian model selection, the Dickey-Savage density ratio, and connect generalized ridge regression and global-local shrinkage methods with double descent. We illustrate our approach for high dimensional neural networks and provide detailed treatments of infinite Gaussian means models and non-parametric regression. Finally, we conclude with directions for future research.
翻译:双重下降是过参数化统计模型(如深度神经网络)的一种现象,其风险函数具有重新下降的特性。随着模型复杂度的增加,风险因传统的偏差-方差权衡呈现U形区域;当参数数量等于观测数量时,模型进入插值状态,此时风险可能无界;最终在过参数化区域风险重新下降——即双重下降效应。本文旨在证明该现象存在自然的贝叶斯解释,并说明其与传统的奥卡姆剃刀原则(在同等条件下简单模型优于复杂模型)并不冲突。我们的理论基础采用贝叶斯模型选择、迪基-萨维奇密度比,并将广义岭回归与全局-局部收缩方法同双重下降现象建立联系。我们通过高维神经网络案例阐释该方法,并对无限高斯均值模型与非参数回归进行详细讨论。最后,我们提出了未来研究方向。