While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tasks, including transfer learning. We use these tight bounds to better understand the role of model size, equivariance, and the implicit biases of optimization, for generalization in deep learning. Notably, we find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor. We also argue for data-independent bounds in explaining generalization.
翻译:虽然在为深神经网络开发非空泛的通用界限方面取得了进展,但这些界限往往对深层学习的原理没有意义。 在本文中,我们根据线性子空间神经网络参数的量化发展了压缩方法,大大改进了以前的成果,为包括转移学习在内的各种任务提供了最先进的通用界限。我们利用这些紧凑界限来更好地了解模型大小的作用、公平性以及优化的隐含偏见,以便在深层学习中加以普及。值得注意的是,我们发现大型模型可以比以前已知的要压缩得多得多,包装奥卡姆的剃刀。我们还主张在解释概括性时采用数据独立的界限。