Although deep neural networks have been immensely successful, there is no comprehensive theoretical understanding of how they work or are structured. As a result, deep networks are often seen as black boxes with unclear interpretations and reliability. Understanding the performance of deep neural networks is one of the greatest scientific challenges. This work aims to apply principles and techniques from information theory to deep learning models to increase our theoretical understanding and design better algorithms. We first describe our information-theoretic approach to deep learning. Then, we propose using the Information Bottleneck (IB) theory to explain deep learning systems. The novel paradigm for analyzing networks sheds light on their layered structure, generalization abilities, and learning dynamics. We later discuss one of the most challenging problems of applying the IB to deep neural networks - estimating mutual information. Recent theoretical developments, such as the neural tangent kernel (NTK) framework, are used to investigate generalization signals. In our study, we obtained tractable computations of many information-theoretic quantities and their bounds for infinite ensembles of infinitely wide neural networks. With these derivations, we can determine how compression, generalization, and sample size pertain to the network and how they are related. At the end, we present the dual Information Bottleneck (dualIB). This new information-theoretic framework resolves some of the IB's shortcomings by merely switching terms in the distortion function. The dualIB can account for known data features and use them to make better predictions over unseen examples. An analytical framework reveals the underlying structure and optimal representations, and a variational framework using deep neural network optimization validates the results.
翻译:虽然深层神经网络非常成功,但对于它们是如何运作或结构的,没有全面的理论理解,因此,深层网络往往被视为解释和可靠性不明的黑盒子。了解深层神经网络的性能是最大的科学挑战之一。这项工作旨在应用从信息理论到深层学习模型的原则和技术,以增进我们的理论理解和设计更好的算法。我们首先描述我们的信息理论方法,然后我们建议使用“信息博特内克(IB)”理论来解释深层学习系统。因此,分析网络的新模式揭示了它们的层结构、一般化能力和学习动态。我们后来讨论了将IB应用到深层神经网络的最棘手的问题之一――估算相互信息。最近的一些理论发展,例如神经性内核(NTK)框架,用来调查概括性信号。在我们的研究中,我们获得了关于许多信息理论数量及其无限的内层网络的界限的可移动性计算方法。随着这些推导出,我们可以用“双层”的精确性结构来决定如何压缩、一般化、预估量的内径的内径和内径框架。我们可以通过“双层”的内置的内置的内置的内置和内置的内置的内置数据。我们可以确定一个内置的内置和内置和内置的内置的内置的内置和内置的内置的内置的内置的内置。