Much research effort has been devoted to explaining the success of deep learning. Random Matrix Theory (RMT) provides an emerging way to this end: spectral analysis of large random matrices involved in a trained deep neural network (DNN) such as weight matrices or Hessian matrices with respect to the stochastic gradient descent algorithm. In this paper, we conduct extensive experiments on weight matrices in different modules, e.g., layers, networks and data sets, to analyze the evolution of their spectra. We find that these spectra can be classified into three main types: Mar\v{c}enko-Pastur spectrum (MP), Mar\v{c}enko-Pastur spectrum with few bleeding outliers (MPB), and Heavy tailed spectrum (HT). Moreover, these discovered spectra are directly connected to the degree of regularization in the DNN. We argue that the degree of regularization depends on the quality of data fed to the DNN, namely Data-Driven Regularization. These findings are validated in several NNs, using Gaussian synthetic data and real data sets (MNIST and CIFAR10). Finally, we propose a spectral criterion and construct an early stopping procedure when the NN is found highly regularized without test data by using the connection between the spectra types and the degrees of regularization. Such early stopped DNNs avoid unnecessary extra training while preserving a much comparable generalization ability.
翻译:为了解释深层学习的成功经验,我们进行了大量研究努力来解释深层学习的成功。随机矩阵理论(RMT)为此提供了一种新兴途径:对精密神经网络(DNN)中大型随机矩阵的光谱分析,如重力矩阵或赫森基矩阵,与随机梯度下降算法有关。在本文中,我们对不同模块(例如层、网络和数据集)中的重力矩阵进行了广泛的实验,以分析其光谱的演进。我们发现,这些光谱可以分为三大类:Mar\v{c}enko-Pastur频谱(MP)、Mar\v{c}enko-Pastur频谱(Mar\v{c}Pastur),与鲜有外出血的外源(MPB)和重型尾部频谱(HT)有关。此外,我们发现这些光谱与DNNN的正规化程度直接相关。我们说,规范化的程度取决于向DNN提供的数据的质量,即数据-Driven Recization。这些发现,这些结果在几个NNNNNN将使用G合成数据和真实数据集(MIS和真实数据集(MIST和CI10)中少出出一个可比较的早期标准,最后我们建议使用一种常规标准,而没有高度的常规标准。