This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We show that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, that shares common characteristics of modern deep networks. The deep layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer via forward propagation, instead of learned via back propagation. All components of so-obtained "white-box" network have precise optimization, statistical, and geometric interpretation. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation also indicates that such a deep convolution network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments clearly verify the effectiveness of both the rate reduction objective and the associated ReduNet. All code and data are available at https://github.com/Ma-Lab-Berkeley.
翻译:这项工作试图提供一个可信的理论框架,以便从数据压缩和歧视性代表性的原则中解释现代深层(革命)网络。我们表明,对于高维多层数据,最佳的线性偏向代表法最大限度地扩大了整个数据集和所有子集的平均值之间的编码率差异。我们显示,为优化降速目标而建立的基本迭代梯度梯度计划自然导致一个多层深层网络,名为ReduNet,共享现代深深网络的共同特征。深层结构、线性和非线性操作员,甚至网络的参数,都是通过前向传播而明确地逐层逐层构建的,而不是通过后向传播而学习。如此可见的“白箱”网络的所有组件都有精确的优化、统计和几何解释。此外,当我们强制分类以严格易变变量时,这种深层的网络结构、线性和非线性操作员,以及网络的参数都是通过前向传播,而不是通过后向传播,逐层逐层构建和学习的。我们初步的“白箱”网络的所有组成部分都有精确的优化、统计、统计和几度解释。此外网络的所有线性操作者自然会变成一个多波变变变的曲线。 。我们现有的数据率和RD-Mab 和RD-Mab