The information bottleneck (IB) principle has been suggested as a way to analyze deep neural networks. The learning dynamics are studied by inspecting the mutual information (MI) between the hidden layers and the input and output. Notably, separate fitting and compression phases during training have been reported. This led to some controversy including claims that the observations are not reproducible and strongly dependent on the type of activation function used as well as on the way the MI is estimated. Our study confirms that different ways of binning when computing the MI lead to qualitatively different results, either supporting or refusing IB conjectures. To resolve the controversy, we study the IB principle in settings where MI is non-trivial and can be computed exactly. We monitor the dynamics of quantized neural networks, that is, we discretize the whole deep learning system so that no approximation is required when computing the MI. This allows us to quantify the information flow without measurement errors. In this setting, we observed a fitting phase for all layers and a compression phase for the output layer in all experiments; the compression in the hidden layers was dependent on the type of activation function. Our study shows that the initial IB results were not artifacts of binning when computing the MI. However, the critical claim that the compression phase may not be observed for some networks also holds true.
翻译:信息瓶颈(IB) 原则已被建议为分析深层神经网络的一种方法。 学习动态是通过检查隐藏层与输入和输出之间的相互信息( MI) 来研究的。 值得注意的是, 在培训期间报告了不同的安装和压缩阶段。 这引起了一些争议, 其中包括: 观测结果无法复制, 并且在很大程度上取决于所使用的激活功能类型以及MI的估算方式。 我们的研究证实, 计算MI 时, 以不同方式宾入的方式导致质量不同的结果, 无论是支持还是拒绝 IB 猜想。 为了解决争议, 我们研究了在MI 并非三轨且可以精确计算的环境下的 IB 原则。 我们监测了四分化神经网络的动态, 也就是说, 我们将整个深层学习系统分解, 这样在计算MI 时不需要近似值。 这使得我们可以在不误判测量的情况下量化信息流。 在这个环境中, 我们观察到, 计算所有层次和所有输出层的压缩阶段都有一个适当的阶段; 隐藏层的压缩取决于激活功能的类型。 我们的研究显示, 在初始的 IB 阶段, 也显示, 关键的IMB 的压缩结果可能没有被观察到。