The geometric structure of an optimization landscape is argued to be fundamentally important to support the success of deep neural network learning. A direct computation of the landscape beyond two layers is hard. Therefore, to capture the global view of the landscape, an interpretable model of the network-parameter (or weight) space must be established. However, the model is lacking so far. Furthermore, it remains unknown what the landscape looks like for deep networks of binary synapses, which plays a key role in robust and energy efficient neuromorphic computation. Here, we propose a statistical mechanics framework by directly building a least structured model of the high-dimensional weight space, considering realistic structured data, stochastic gradient descent training, and the computational depth of neural networks. We also consider whether the number of network parameters outnumbers the number of supplied training data, namely, over- or under-parametrization. Our least structured model reveals that the weight spaces of the under-parametrization and over-parameterization cases belong to the same class, in the sense that these weight spaces are well-connected without any hierarchical clustering structure. In contrast, the shallow-network has a broken weight space, characterized by a discontinuous phase transition, thereby clarifying the benefit of depth in deep learning from the angle of high dimensional geometry. Our effective model also reveals that inside a deep network, there exists a liquid-like central part of the architecture in the sense that the weights in this part behave as randomly as possible, providing algorithmic implications. Our data-driven model thus provides a statistical mechanics insight about why deep learning is unreasonably effective in terms of the high-dimensional weight space, and how deep networks are different from shallow ones.
翻译:优化地貌的几何结构被认为对于支持深层神经网络学习的成功至关重要。 直接计算两层以上的地貌很难。 因此, 要捕捉全球地貌, 必须建立网络参数( 或重量) 空间的可解释模型。 但是, 模型目前还缺乏。 此外, 仍然不清楚二进制突触深度网络的地貌结构, 在稳健和节能神经形态计算中起着关键作用。 在此, 我们建议建立一个统计力框架, 直接建立一个结构最差的高度重量空间模型, 考虑现实的结构性数据、 直观的梯度梯度下降培训以及神经网络的计算深度。 我们还考虑网络参数的数量是否超过了所提供的培训数据的数量, 即超度或低于度。 我们结构最差的模型显示, 低度和超度神经形态的加权空间空间空间空间空间空间空间空间空间的重量空间空间空间空间空间空间空间空间空间空间空间空间结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构结构的深度和深度分析, 以深度深度深度的深度分析为深度的深度数据结构结构结构结构结构分析, 以深度的深度的深度分析, 以深度的深度的深度的深度空间空间空间空间结构结构结构结构结构结构结构分析为深度分析, 的深度的深度的深度的深度分析, 的深度分析为深度的深度分析, 的深度的深度分析, 以深度的深度分析, 的深度的深度的深度分析, 的深度的深度的深度分析, 的深度的深度的深度分析, 的深度的深度分析, 分析, 的深度分析, 的深度分析, 的深度的深度分析, 的深度的深度的深度的深度分析, 的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度分析, 的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的深度的