To understand the essential role of depth in neural networks, we investigate a variational principle for depth: Does increasing depth perform an implicit optimization for the representations in neural networks? We prove that random neural networks equipped with batch normalization maximize the differential entropy of representations with depth up to constant factors, assuming that the representations are contractive. Thus, representations inherently obey the \textit{principle of maximum entropy} at initialization, in the absence of information about the learning task. Our variational formulation for neural representations characterizes the interplay between representation entropy and architectural components, including depth, width, and non-linear activations, thereby potentially inspiring the design of neural architectures.
翻译:为了理解深度在神经网络中的基本作用,我们调查了深度的变异原则:越来越深的深度是否对神经网络中的表达方式产生隐含的优化作用?我们证明,配备了分批正常化的随机神经网络最大限度地扩大了带有深度至恒定因素的表达方式的微小差异,假设表示方式是合同性的。因此,在初始化时,在缺乏关于学习任务的信息的情况下,表达方式必然遵守了最大增缩原则。我们神经表达方式的变异性描述特征体现了代表式的内心和建筑组成部分之间的相互作用,包括深度、宽度和非线性活化,从而有可能激励神经结构的设计。