Training very deep neural networks is still an extremely challenging task. The common solution is to use shortcut connections and normalization layers, which are both crucial ingredients in the popular ResNet architecture. However, there is strong evidence to suggest that ResNets behave more like ensembles of shallower networks than truly deep ones. Recently, it was shown that deep vanilla networks (i.e. networks without normalization layers or shortcut connections) can be trained as fast as ResNets by applying certain transformations to their activation functions. However, this method (called Deep Kernel Shaping) isn't fully compatible with ReLUs, and produces networks that overfit significantly more than ResNets on ImageNet. In this work, we rectify this situation by developing a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs. We show in experiments that our method, which introduces negligible extra computational cost, achieves validation accuracies with deep vanilla networks that are competitive with ResNets (of the same width/depth), and significantly higher than those obtained with the Edge of Chaos (EOC) method. And unlike with EOC, the validation accuracies we obtain do not get worse with depth.
翻译:深层神经网络的培训仍是一项极具挑战性的任务。 共同的解决方案是使用捷径连接和正常化层,这是受欢迎的ResNet架构中最关键的元素。 然而,有确凿证据表明ResNets的行为更像是浅网络的集合体,而不是真正深的网络。 最近,它表明深香草网络(即没有正常化层或捷径连接的网络)可以通过对ResNets的激活功能应用某些转换来进行与ResNets一样的快速培训。 然而,这种方法(称为深心内螺旋结构)与ReLUs不完全兼容,并产生比图像网络的ResNets大得多的网络。 在这项工作中,我们通过开发一种与RELUs -- -- Leaky RELUs的变种完全兼容的新型改造来纠正这种情况。 我们在实验中显示,我们的方法,即引入微不足道的计算额外成本,通过与ResNets具有竞争力的深香草网络(同一宽度/深度的宽度/深度)获得验证,并且远远高于与EGege Chaos (EOC) 获得更差的验证方法。