Unraveling the reasons behind the remarkable success and exceptional generalization capabilities of deep neural networks presents a formidable challenge. Recent insights from random matrix theory, specifically those concerning the spectral analysis of weight matrices in deep neural networks, offer valuable clues to address this issue. A key finding indicates that the generalization performance of a neural network is associated with the degree of heavy tails in the spectrum of its weight matrices. To capitalize on this discovery, we introduce a novel regularization technique, termed Heavy-Tailed Regularization, which explicitly promotes a more heavy-tailed spectrum in the weight matrix through regularization. Firstly, we employ the Weighted Alpha and Stable Rank as penalty terms, both of which are differentiable, enabling the direct calculation of their gradients. To circumvent over-regularization, we introduce two variations of the penalty function. Then, adopting a Bayesian statistics perspective and leveraging knowledge from random matrices, we develop two novel heavy-tailed regularization methods, utilizing Powerlaw distribution and Frechet distribution as priors for the global spectrum and maximum eigenvalues, respectively. We empirically show that heavytailed regularization outperforms conventional regularization techniques in terms of generalization performance.
翻译:揭示深度神经网络在卓越的泛化能力和成功方面背后的原因是具有巨大挑战的。最近,来自随机矩阵理论的见解,特别是有关深度神经网络中权重矩阵的谱分析的见解,为解决该问题提供了宝贵线索。一个关键发现表明,神经网络的泛化性能与其重尾谱的程度相关。为了利用这一发现,我们引入了一种新颖的正则化技术,称为重尾正则化,通过正则化明确促进了权重矩阵中更重尾谱的形成。首先,我们使用加权阿尔法和稳定秩作为惩罚项,两者都是可微的,可以直接计算它们的梯度。为了规避过多正则化,我们引入了两种惩罚函数的变化。然后,采用贝叶斯统计学的观点,并利用随机矩阵的知识,我们开发了两种新的重尾正则化方法,利用幂律分布和Fechet分布作为全局谱和最大特征值的先验,分别。我们在实验中表明,与传统的正则化技术相比,重尾正则化在泛化性能方面表现更好。