We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck. Unlike the neural tangent kernel limit, a bottleneck in an otherwise infinite width network al-lows data dependent feature learning in its bottle-neck representation. We empirically show that a single bottleneck in infinite networks dramatically accelerates training when compared to purely in-finite networks, with an improved overall performance. We discuss the acceleration phenomena by drawing similarities to infinitely wide deep linear models, where the acceleration effect of a bottleneck can be understood theoretically.
翻译:我们分析了无限宽的神经网络的学习动态,这些网络有一定大小的瓶颈。与神经相近的内核限制不同,在一个本来是无限宽的网络中存在瓶颈,高低的数据依附于在其瓶颈表征中学习特征。我们从经验上表明,与纯无限的网络相比,无限网络中的单一瓶颈极大地加快了培训速度,提高了总体性能。我们通过与无限宽的深度线性模型有相似之处来讨论加速现象,从理论上可以理解瓶颈的加速效应。