Introduced by Hinton et al. in 2012, dropout has stood the test of time as a regularizer for preventing overfitting in neural networks. In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training. During the early phase, we find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient. This helps counteract the stochasticity of SGD and limit the influence of individual batches on model training. Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards. Models equipped with early dropout achieve lower final training loss compared to their counterparts without dropout. Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training. Experiments on ImageNet and various vision tasks demonstrate that our methods consistently improve generalization accuracy. Our results encourage more research on understanding regularization in deep learning and our methods can be useful tools for future neural network training, especially in the era of large data. Code is available at https://github.com/facebookresearch/dropout .
翻译:Hinton等人在2012年推出的 " 辍学 " 方案,在2012年被Hinton等人引入了时间测试,作为防止神经网络过度装配的常规化器。在这项研究中,我们证明 " 辍学 " 方案在开始培训时也能减少不适应情况。 在早期阶段,我们发现 " 辍学 " 方案可以减少小型公交箱之间梯度的方向差异,并帮助将微型批量梯度与整个数据集梯度相匹配。这有助于抵消SGD的随机性,并限制个别批次对模型培训的影响。我们的发现使我们找到一个解决方案,改进不适应模型的性能 -- -- 早期失学:仅在培训的初始阶段适用辍学,并在培训之后关闭。 " 早期失学模型 " 与不辍学的同行相比,最终培训损失较低。此外,我们还探索一种对称技术,使模式的超常化 -- -- 迟辍学在早期循环中不使用,在培训中仅晚期启动。 " 图像网络实验 " 和各种视觉任务 " 显示,我们的方法始终在提高普遍化的准确性。我们的成果鼓励对深层次学习的正规化进行更多的研究研究,我们的方法可以成为未来网络的实用工具。</s>