In deep learning, neural networks serve as noisy channels between input data and its representation. This perspective naturally relates deep learning with the pursuit of constructing channels with optimal performance in information transmission and representation. While considerable efforts are concentrated on realizing optimal channel properties during network optimization, we study a frequently overlooked possibility that neural networks can be initialized toward optimal channels. Our theory, consistent with experimental validation, identifies primary mechanics underlying this unknown possibility and suggests intrinsic connections between statistical physics and deep learning. Unlike the conventional theories that characterize neural networks applying the classic mean-filed approximation, we offer analytic proof that this extensively applied simplification scheme is not valid in studying neural networks as information channels. To fill this gap, we develop a corrected mean-field framework applicable for characterizing the limiting behaviors of information propagation in neural networks without strong assumptions on inputs. Based on it, we propose an analytic theory to prove that mutual information maximization is realized between inputs and propagated signals when neural networks are initialized at dynamic isometry, a case where information transmits via norm-preserving mappings. These theoretical predictions are validated by experiments on real neural networks, suggesting the robustness of our theory against finite-size effects. Finally, we analyze our findings with information bottleneck theory to confirm the precise relations among dynamic isometry, mutual information maximization, and optimal channel properties in deep learning.
翻译:在深层次的学习中,神经网络是输入数据及其代表性之间的杂音渠道。这一视角自然地将深度学习与建设信息传输和代表性方面最佳表现的渠道联系起来。虽然大量的努力集中于在网络优化期间实现最佳渠道特性,但我们研究神经网络可以初始化为最佳渠道的可能性经常被忽略。我们的理论与实验验证一致,确定了这一未知可能性背后的主要机制,并暗示了统计物理和深层学习之间的内在联系。与使用典型平均存档近似神经网络的典型神经网络特征的传统理论不同,我们提供了分析性的证明,证明这一广泛应用的简化计划在将神经网络作为信息渠道来学习时并不有效。为填补这一差距,我们开发了用于描述神经网络中信息传播有限行为特征的经修正的平均值框架,而没有对投入进行强有力的假设。基于这一理论,我们提出了一个分析理论理论理论,以证明在神经网络初始化时,在神经网络初始化时,在传播信息时,我们通过规范保存的绘图来传递。这些理论预测通过对真实的神经网络进行实验来验证,并用我们最精确的理论分析,表明我们最终的动态的理论结论。