Unraveling the reasons behind the remarkable success and exceptional generalization capabilities of deep neural networks presents a formidable challenge. Recent insights from random matrix theory, specifically those concerning the spectral analysis of weight matrices in deep neural networks, offer valuable clues to address this issue. A key finding indicates that the generalization performance of a neural network is associated with the degree of heavy tails in the spectrum of its weight matrices. To capitalize on this discovery, we introduce a novel regularization technique, termed Heavy-Tailed Regularization, which explicitly promotes a more heavy-tailed spectrum in the weight matrix through regularization. Firstly, we employ the Weighted Alpha and Stable Rank as penalty terms, both of which are differentiable, enabling the direct calculation of their gradients. To circumvent over-regularization, we introduce two variations of the penalty function. Then, adopting a Bayesian statistics perspective and leveraging knowledge from random matrices, we develop two novel heavy-tailed regularization methods, utilizing Powerlaw distribution and Frechet distribution as priors for the global spectrum and maximum eigenvalues, respectively. We empirically show that heavytailed regularization outperforms conventional regularization techniques in terms of generalization performance.
翻译:深度神经网络卓越的成功和特别的泛化能力背后的原因是一个艰巨的挑战。最近从随机矩阵理论中的认识,特别是关于深度神经网络中权重矩阵的谱分析,为解决这个问题提供了有价值的线索。一个关键发现表明,神经网络的泛化性能与其权重矩阵的谱中重尾程度有关。为了利用这个发现,我们引入了一种新的正则化技术,称为重尾正则化,通过正则化明确地促进权重矩阵中更重尾的谱。首先,我们使用加权阿尔法和稳定秩作为惩罚项,这两个项都是可微的,可以直接计算它们的梯度。为了避免过度正则化,我们引入了两种惩罚函数的变化形式。然后,采用贝叶斯统计学的观点,并利用随机矩阵的知识,我们开发了两种新的重尾正则化方法,利用幂律分布和Frechet分布作为全局谱和最大特征值的先验分布。我们在实践中证明,重尾正则化在泛化性能方面优于传统的正则化技术。