A recent line of work has established intriguing connections between the generalization/compression properties of a deep neural network (DNN) model and the so-called layer weights' stable ranks. Intuitively, the latter are indicators of the effective number of parameters in the net. In this work, we address some natural questions regarding the space of DNNs conditioned on the layers' stable rank, where we study feed-forward dynamics, initialization, training and expressivity. To this end, we first propose a random DNN model with a new sampling scheme based on stable rank. Then, we show how feed-forward maps are affected by the constraint and how training evolves in the overparametrized regime (via Neural Tangent Kernels). Our results imply that stable ranks appear layerwise essentially as linear factors whose effect accumulates exponentially depthwise. Moreover, we provide empirical analysis suggesting that stable rank initialization alone can lead to convergence speed ups.
翻译:最近的一项工作在深神经网络(DNN)模型的一般化/压缩特性与所谓的层权重稳定等级之间建立起了令人感兴趣的联系,从直觉上看,后者是网络参数有效数量的指标。在这项工作中,我们处理一些关于以层稳定等级为条件的DNN空间的自然问题,我们在那里研究进料向导动态、初始化、培训和表达性。为此,我们首先提出一个随机的DNN模型,并采用以稳定等级为基础的新取样办法。然后,我们展示进料向前的地图如何受到制约的影响,以及培训在过度平衡制度(通过Neural Tangent Kernels)中是如何演变的。我们的结果表明,稳定的等级基本上看起来是线性因素,其效应会以指数深度指数累积。此外,我们提供经验分析表明,单稳定级初始化本身就能够导致趋同速度的。