残差神经网络(ResNet)是一种人工神经网络(ANN),剩余的神经网络通过使用跳过连接跳过某些层来实现这一点。典型的ResNet模型是通过包含非线性(ReLU)和一部分双层或三重层跳跃来实现的。残差网络的特点是容易优化,并且能够通过增加相当的深度来提高准确率。其内部的残差块使用了跳跃连接,缓解了在深度神经网络中增加深度带来的梯度消失问题。

VIP内容

用反向传播方法训练深度残差神经网络(ResNets)的记忆成本随网络深度的增加而线性增加。规避这个问题的一种方法是使用可逆的架构。本文提出通过增加动量项来改变ResNet的正向规则。所得到的网络,动量剩余神经网络(动量ResNets)是可逆的。与以前的可逆架构不同,它们可以作为任何现有的ResNet块的替代。我们证明动量ResNets可以被解释为二阶常微分方程(ode),并准确地描述了如何逐步增加动量增加动量ResNets的表示能力。我们的分析显示,Momentum ResNets可以学习任何线性映射到一个倍增因子,而ResNets不能。在优化设置的学习中,需要收敛到一个不动点,我们从理论上和经验上证明了我们的方法成功,而现有的可逆架构失败。我们在CIFAR和ImageNet上展示了Momentum ResNets与ResNets具有相同的精度,但占用的内存要小得多,并展示了预训练的Momentum ResNets对模型的微调是有前途的。

https://www.zhuanzhi.ai/paper/867b3834167694dab97cf812135dc273

成为VIP会员查看完整内容
0
15

最新论文

Deep neural networks are highly expressive machine learning models with the ability to interpolate arbitrary datasets. Deep nets are typically optimized via first-order methods and the optimization process crucially depends on the characteristics of the network as well as the dataset. This work sheds light on the relation between the network size and the properties of the dataset with an emphasis on deep residual networks (ResNets). Our contribution is that if the network Jacobian is full rank, gradient descent for the quadratic loss and smooth activation converges to the global minima even if the network width $m$ of the ResNet scales linearly with the sample size $n$, and independently from the network depth. To the best of our knowledge, this is the first work which provides a theoretical guarantee for the convergence of neural networks in the $m=\Omega(n)$ regime.

0
0
下载
预览
父主题
子主题
Top