We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. We analyse both standard weight normalization (SWN) and exponential weight normalization (EWN), and show that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate. We extend these results to gradient descent, and establish asymptotic relations between weights and gradients for both SWN and EWN. We also show that EWN causes weights to be updated in a way that prefers asymptotic relative sparsity. For EWN, we provide a finite-time convergence rate of the loss with gradient flow and a tight asymptotic convergence rate with gradient descent. We demonstrate our results for SWN and EWN on synthetic data sets. Experimental results on simple datasets support our claim on sparse EWN solutions, even with SGD. This demonstrates its potential applications in learning neural networks amenable to pruning.
翻译:我们分析在进行指数或交叉湿度损失培训时,对重的梯度下降对平滑均匀神经网进行正常的平滑神经网的感应偏差。我们分析标准重量正常化(SWN)和指数重量正常化(EWN)两者的感应偏差。我们分析标准网络与EWN的梯度流动路径相当于标准网络的梯度流动,具有适应性学习率。我们将这些结果扩大到梯度下降,并为SWN和EWN的重量和梯度建立无症状关系。我们还显示,EWN的重量产生以偏向于无症状相对宽度的方式更新的重量。对于EWN,我们提供了与梯度流动和梯度下降的紧固度混合率的一定时间损失趋同率。我们展示了我们在SWN和EWN的合成数据集上的结果。关于简单数据集的实验结果支持我们对稀疏的EWN解决方案的主张,即使SGD,也证明了其在学习适合运行的神经网络中的潜在应用。