How much information does a learning algorithm extract from the training data and store in a neural network's weights? Too much, and the network would overfit to the training data. Too little, and the network would not fit to anything at all. Na\"ively, the amount of information the network stores should scale in proportion to the number of trainable weights. This raises the question: how can neural networks with vastly more weights than training data still generalise? A simple resolution to this conundrum is that the number of weights is usually a bad proxy for the actual amount of information stored. For instance, typical weight vectors may be highly compressible. Then another question occurs: is it possible to compute the actual amount of information stored? This paper derives both a consistent estimator and a closed-form upper bound on the information content of infinitely wide neural networks. The derivation is based on an identification between neural information content and the negative log probability of a Gaussian orthant. This identification yields bounds that analytically control the generalisation behaviour of the entire solution space of infinitely wide networks. The bounds have a simple dependence on both the network architecture and the training data. Corroborating the findings of Valle-P\'erez et al. (2019), who conducted a similar analysis using approximate Gaussian integration techniques, the bounds are found to be both non-vacuous and correlated with the empirical generalisation behaviour at finite width.
翻译:学习算法从培训数据中提取的信息和存储到神经网络的权重有多少? 太多, 网络会超过培训数据。 太少, 网络将无法满足培训数据。 太少, 网络将无法满足任何要求。 Na\ “ 上, 网络存储的信息数量应该与可培训的权重成比例。 这就提出了这样一个问题: 神经网络的权重比培训数据仍然泛化的要大得多, 神经网络的权重要大得多? 这个难题的一个简单解答是, 重量的数量通常是一个对存储信息的实际数量来说的不良代谢。 例如, 典型的权重矢量可能是高度可压缩的。 然后又出现另一个问题: 能否对存储的信息的实际数量进行计算? 本文给出了一个一致的估量和对无限宽度的神经网络内容的封闭式上层。 由此推导出的一个基础是确定神经信息内容和高或太平的负逻辑概率概率。 这个识别使分析能控制无限宽网络整个解度的权重空间的概括行为。 定义的网络结构与一个简单的依赖性分析, 使用常规的逻辑结构, 和水平分析, 谁进行一个约束性分析, 和水平分析, 的网络的对数值分析, 的分义性分析, 进行。