过度参数化的可能性模型:深神经网络的双源理论 (Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks)

We investigate the asymptotic risk of a general class of overparameterized likelihood models, including deep models. The recent empirical success of large-scale models has motivated several theoretical studies to investigate a scenario wherein both the number of samples, $n$, and parameters, $p$, diverge to infinity and derive an asymptotic risk at the limit. However, these theorems are only valid for linear-in-feature models, such as generalized linear regression, kernel regression, and shallow neural networks. Hence, it is difficult to investigate a wider class of nonlinear models, including deep neural networks with three or more layers. In this study, we consider a likelihood maximization problem without the model constraints and analyze the upper bound of an asymptotic risk of an estimator with penalization. Technically, we combine a property of the Fisher information matrix with an extended Marchenko-Pastur law and associate the combination with empirical process techniques. The derived bound is general, as it describes both the double descent and the regularized risk curves, depending on the penalization. Our results are valid without the linear-in-feature constraints on models and allow us to derive the general spectral distributions of a Fisher information matrix from the likelihood. We demonstrate that several explicit models, such as parallel deep neural networks, ensemble learning, and residual networks, are in agreement with our theory. This result indicates that even large and deep models have a small asymptotic risk if they exhibit a specific structure, such as divisibility. To verify this finding, we conduct a real-data experiment with parallel deep neural networks. Our results expand the applicability of the asymptotic risk analysis, and may also contribute to the understanding and application of deep learning.

翻译：我们调查了包括深层模型在内的高分量概率模型一般等级的表面风险。最近大型模型的成功经验促使进行了若干理论研究,以调查一个假设情景,其中样本数量、美元和参数、美元、差异至无限值,并在极限处得出无足轻重的风险。然而,这些理论仅仅适用于线性-地表模型,如普遍线性回归、内核回归和浅神经网络。因此,很难调查更广泛的非线性模型类别,包括具有三层或以上层次的深层神经网络。在本研究中,我们考虑的是可能最大化的问题,而没有模型限制,而是分析一个估算者无足轻重的风险的上限。从技术上讲,我们把渔业信息矩阵的属性与一个扩展的三连程-帕图法律结合起来,并将这些组合与实证性流程技术联系起来。由此推算的界限与深层线性风险曲线和常规风险曲线都有扩大,这取决于深层的深层内线性网络。我们在深层理解性网络中,我们的结果是有效的,我们从直线性网络和直线性模型中可以理解。我们的一个结果,我们从直观的模型和直观模型的模型显示一个结果的模型的模型,我们从一个结果,我们从直观的模型中可以显示一个结果的模型的模型的模型的模型的模型,我们从一个结果,从直观到一个直观到一个结果。