We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. Our analysis focuses on exponential weight normalization (EWN), which encourages weight updates along the radial direction. This paper shows that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate, and hence causes the weights to be updated in a way that prefers asymptotic relative sparsity. These results can be extended to hold for gradient descent via an appropriate adaptive learning rate. The asymptotic convergence rate of the loss in this setting is given by $\Theta(\frac{1}{t(\log t)^2})$, and is independent of the depth of the network. We contrast these results with the inductive bias of standard weight normalization (SWN) and unnormalized architectures, and demonstrate their implications on synthetic data sets.Experimental results on simple data sets and architectures support our claim on sparse EWN solutions, even with SGD. This demonstrates its potential applications in learning prunable neural networks.
翻译:我们分析的是,在接受指数或跨孔虫损失培训时,重度梯度下降对平滑均匀神经网的感应偏差。我们的分析侧重于指数重量正常化(EWN),它鼓励沿着辐射方向进行重量更新。本文表明,与EWN的梯度流路径相当于标准网络的梯度流,具有适应性学习率,从而导致权重以偏向于亚湿度相对宽度的方式更新。这些结果可以通过适当的适应性学习率扩展至保持梯度下降。这一环境的损失的无症状趋同率由 $\theta (\\ frac{1\\\ t\\ t\ t\ t\ t\\ 2}美元给出,并且独立于网络的深度。我们把这些结果与标准重量正常化(SWN)和不规范结构的感应感性偏差偏差相对比,并展示其对合成数据集的影响。关于简单数据集和结构的实验结果支持我们对稀疏的EWN解决方案的主张,甚至与SGDGD。这显示了其在可学习的神经网络中的潜在应用。