Information bottleneck (IB) depicts a trade-off between the accuracy and conciseness of encoded representations. IB has succeeded in explaining the objective and behavior of neural networks (NNs) as well as learning better representations. However, there are still critics of the universality of IB, e.g., phase transition usually fades away, representation compression is not causally related to generalization, and IB is trivial in deterministic cases. In this work, we build a new IB based on the trade-off between the accuracy and complexity of learned weights of NNs. We argue that this new IB represents a more solid connection to the objective of NNs since the information stored in weights (IIW) bounds their PAC-Bayes generalization capability, hence we name it as PAC-Bayes IB (PIB). On IIW, we can identify the phase transition phenomenon in general cases and solidify the causality between compression and generalization. We then derive a tractable solution of PIB and design a stochastic inference algorithm by Markov chain Monte Carlo sampling. We empirically verify our claims through extensive experiments. We also substantiate the superiority of the proposed algorithm on training NNs.
翻译:信息瓶颈(IB) 描述了编码代表的准确性和简洁性之间的权衡。 IB 成功地解释了神经网络的目标和行为,并学习了更好的表述。然而,仍然有人批评IB的普遍性,例如,阶段过渡通常会消失,代表压缩并不因果而与一般化有关,IB在确定性案例中是微不足道的。在这项工作中,我们根据所学 NNC 重量的准确性和复杂性之间的权衡,建立了一个新的IB。我们认为,这一新IB 与NNC 的目标有着更牢固的联系,因为储存在重量中的信息限制了其PAC-Bayes一般化能力,因此我们将其命名为PAC-Bayes IB(PIB)。关于IW,我们可以确定一般情况下的阶段过渡现象,并巩固压缩和一般化之间的因果关系。我们随后得出PIB的可感性解决办法,并设计了Markov 链 蒙特卡洛 取样的推论算法。我们还通过广泛的实验证实了我们提出的高超度要求。