Regularization plays a major role in modern deep learning. From classic techniques such as L1,L2 penalties to other noise-based methods such as Dropout, regularization often yields better generalization properties by avoiding overfitting. Recently, Stochastic Depth (SD) has emerged as an alternative regularization technique for residual neural networks (ResNets) and has proven to boost the performance of ResNet on many tasks [Huang et al., 2016]. Despite the recent success of SD, little is known about this technique from a theoretical perspective. This paper provides a hybrid analysis combining perturbation analysis and signal propagation to shed light on different regularization effects of SD. Our analysis allows us to derive principled guidelines for choosing the survival rates used for training with SD.
翻译:常规化在现代深层学习中起着重要作用。 从L1,L2惩罚等传统技术到其他噪音法方法(如辍学),正规化往往通过避免过度适应而产生更概括化的特性。最近,Stochacistic 深度(SD)已成为残余神经网络(ResNets)的替代常规化技术(ResNets ), 并被证明可以提高ResNet在许多任务上的绩效[Huang等人,2016年]。尽管SD最近取得了成功,但从理论角度来说,对这一技术所知甚少。本文提供了混合分析,将扰动分析与信号传播相结合,以揭示SDD的不同规范效应。我们的分析使我们能够为选择用于SD培训的存活率制定原则性指导方针。