This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We argue that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, which shares common characteristics of modern deep networks. The deep layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer via forward propagation, although they are amenable to fine-tuning via back propagation. All components of so-obtained "white-box" network have precise optimization, statistical, and geometric interpretation. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation in the invariant setting suggests a trade-off between sparsity and invariance, and also indicates that such a deep convolution network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments clearly verify the effectiveness of both the rate reduction objective and the associated ReduNet. All code and data are available at \url{https://github.com/Ma-Lab-Berkeley}.
翻译:这项工作试图提供一个可信的理论框架,以便从数据压缩和歧视性代表性的原则中解释现代深层(革命)网络。我们认为,对于高维多级数据,最佳的线性偏向性代表法最大限度地扩大了整个数据集和所有子集的平均值之间的编码率差异。我们表明,优化降速目标的基本迭代梯度升温计划自然导致一个多层深层网络,名为ReduNet,这个网络具有现代深深网络的共同特征。深层结构、线性和非线性操作员,甚至网络的参数,都是通过远端传播逐层明确构建的,尽管它们容易通过反向传播进行微调。高端的“白箱”网络的所有组成部分都有精确的优化、统计和几何解释。此外,当我们强制分类以严格易变异性为主时,所有线性网络的操作员自然会成为多层的共变相。在变量设置中暗示着空间和易变异性之间的交易,尽管它们容易通过前向传播进行细化调整。此外,还表明,这种深域网络的“白箱”网络的所有组成部分都能够对数据进行精确的校验测。