Randomly initialized wide neural networks transition to linear functions of weights as the width grows, in a ball of radius $O(1)$ around initialization. A necessary condition for this result is that all layers of the network are wide enough, i.e., all widths tend to infinity. However, the transition to linearity breaks down when this infinite width assumption is violated. In this work we show that linear networks with a bottleneck layer learn bilinear functions of the weights, in a ball of radius $O(1)$ around initialization. In general, for $B-1$ bottleneck layers, the network is a degree $B$ multilinear function of weights. Importantly, the degree only depends on the number of bottlenecks and not the total depth of the network.
翻译:随着宽度的增长,在半径为1美元(1美元)的圆球周围,随机初始的宽度神经网络向重的线性函数过渡。这一结果的一个必要条件是,网络的所有层都足够宽,即所有宽度都具有无限的宽度。然而,如果这一无限宽度假设被违反,向线性网络的过渡就会中断。在这项工作中,我们表明,带有瓶颈层的线性网络在半径为1美元(1美元)的圆球周围,学习重量的双线性函数。一般而言,对于1美元(1美元)的瓶颈层,网络是重量的多线性函数。重要的是,程度只取决于瓶颈的数量,而不是网络的总深度。