We investigate whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities. We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions. This raises major questions about the usefulness of these models in deep learning. We further show that theoretical results, such as the layered nature of critical points, are strongly dependent on the use of the exact form of these limiting spectral densities. We consider two new classes of matrix ensembles; random Wigner/Wishart ensemble products and percolated Wigner/Wishart ensembles, both of which better match observed spectra. They also give large discrete spectral peaks at the origin, providing a theoretical explanation for the observation that various optima can be connected by one dimensional of low loss values. We further show that, in the case of a random matrix product, the weight of the discrete spectral component at $0$ depends on the ratio of the dimensions of the weight matrices.
翻译:我们调查的是,通常用于深神经网络理论分析的Wigner半圆轴和Marcenko-Pastur分布是否与实验观测到的光谱密度相匹配。我们发现,即使允许外部光谱,所观测到的光谱形状也与这种理论预测大相径庭。这引起了关于这些模型在深层次学习中的效用的重大问题。我们进一步表明,诸如临界点的多层性质等理论结果在很大程度上取决于这些限制光谱密度的确切形式的使用。我们考虑的是两个新的矩阵组合类别:随机的Wigner/Wishart合体产品和经过渗透的Wigner/Wishart组合,两者都与观测到的光谱相匹配。它们也给原始的大型离散光谱峰提供了理论上的解释,即各种opima可以通过低损失值的一维连接。我们进一步表明,在随机矩阵产品中,离散光谱组成部分的重量为0.0美元,取决于重量矩阵的比重。