过度参数化的可能性模型:深神经网络的双源理论 (Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks)

We investigate the asymptotic risk of a general class of overparameterized likelihood models, including deep models. The recent empirical success of large-scale models has motivated several theoretical studies to investigate a scenario wherein both the number of samples, $n$, and parameters, $p$, diverge to infinity and derive an asymptotic risk at the limit. However, these theorems are only valid for linear-in-feature models, such as generalized linear regression, kernel regression, and shallow neural networks. Hence, it is difficult to investigate a wider class of nonlinear models, including deep neural networks with three or more layers. In this study, we consider a likelihood maximization problem without the model constraints and analyze the upper bound of an asymptotic risk of an estimator with penalization. Technically, we combine a property of the Fisher information matrix with an extended Marchenko-Pastur law and associate the combination with empirical process techniques. The derived bound is general, as it describes both the double descent and the regularized risk curves, depending on the penalization. Our results are valid without the linear-in-feature constraints on models and allow us to derive the general spectral distributions of a Fisher information matrix from the likelihood. We demonstrate that several explicit models, such as parallel deep neural networks and ensemble learning, are in agreement with our theory. This result indicates that even large and deep models have a small asymptotic risk if they exhibit a specific structure, such as divisibility. To verify this finding, we conduct a real-data experiment with parallel deep neural networks. Our results expand the applicability of the asymptotic risk analysis, and may also contribute to the understanding and application of deep learning.

翻译：我们调查了包括深层模型在内的高分界线概率模型一般等级的表面风险,包括深度模型。最近大型模型的实证成功促使进行了若干理论研究,以调查一种假设情景,即样本数量、美元和参数、美元、偏离至无限度,并得出无限度的无光度风险。然而,这些理论只适用于线性-地貌模型,如普遍线性回归、内核回归和浅线性神经网络。因此,很难调查更广泛的非线性模型类别,包括深层或更深层的神经网络。在本研究中,我们考虑的是没有模型限制的可能最大化问题,并分析估算者无光度风险的上限。从技术上讲,我们把渔业信息矩阵的属性与扩展的三连程-帕图法律结合起来,并将这些组合与经验性流程技术联系起来。由此推导出的界限很笼统,因为它既说明了双向和常规风险曲线,又取决于深度或更深层层层的神经网络。在这个平行的研究中,我们的结果是真实的,而我们从一个直观的模型中可以理解,我们从直观的模型中可以看出。我们从这个模型的模型的模型和直观的模型的模型中可以看出。我们对一个结果的模型的模型的模型进行推论的推算。