Heavy-tailed distributions have been studied in statistics, random matrix theory, physics, and econometrics as models of correlated systems, among other domains. Further, heavy-tail distributed eigenvalues of the covariance matrix of the weight matrices in neural networks have been shown to empirically correlate with test set accuracy in several works (e.g. arXiv:1901.08276), but a formal relationship between heavy-tail distributed parameters and generalization bounds was yet to be demonstrated. In this work, the compression framework of arXiv:1802.05296 is utilized to show that matrices with heavy-tail distributed matrix elements can be compressed, resulting in networks with sparse weight matrices. Since the parameter count has been reduced to a sum of the non-zero elements of sparse matrices, the compression framework allows us to bound the generalization gap of the resulting compressed network with a non-vacuous generalization bound. Further, the action of these matrices on a vector is discussed, and how they may relate to compression and resilient classification is analyzed.
翻译:在统计、随机矩阵理论、物理和计量经济学等领域中,对作为相关系统模型的重尾分配分布进行了研究;此外,神经网络中重质矩阵共变矩阵的重尾分布性电子价值被证明与若干工程(例如arXiv:1901.08276)的测试设定准确性有经验关联,但重尾分布参数和一般化界限之间的正式关系尚未得到证明;在这项工作中,使用arXiv:1802.05296的压缩框架表明,重尾分布矩阵元素的压缩可导致网络重量矩阵稀少;由于参数计数已减为稀薄矩阵的非零要素之和,压缩框架使我们得以将由此产生的压缩网络的普遍差距与非真空的统称捆绑起来;此外,还讨论了这些矩阵对矢量的动作,并分析了它们与压缩和弹性分类的关系。