Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization. However, unstructured Dropout is not always effective for specific network architectures and this has led to the formation of multiple structured Dropout approaches to improve model performance and, sometimes, reduce the computational resources required for inference. In this work, we revisit structured Dropout comparing different Dropout approaches to natural language processing and computer vision tasks for multiple state-of-the-art networks. Additionally, we devise an approach to structured Dropout we call \textbf{\emph{ProbDropBlock}} which drops contiguous blocks from feature maps with a probability given by the normalized feature salience values. We find that with a simple scheduling strategy the proposed approach to structured Dropout consistently improved model performance compared to baselines and other Dropout approaches on a diverse range of tasks and models. In particular, we show \textbf{\emph{ProbDropBlock}} improves RoBERTa finetuning on MNLI by $0.22\%$, and training of ResNet50 on ImageNet by $0.28\%$.
翻译:大型神经网络往往被过度分解,容易过度安装,辍学是一种广泛使用的正规化技术,用来克服过度装配和改进模型集成。然而,无结构的辍学对于特定的网络架构并不总是有效,这导致形成了多种结构化的辍学办法,以改善模型性能,有时还减少了计算推理所需的计算资源。在这项工作中,我们重新审视结构化的辍学办法,比较了自然语言处理和计算机视觉任务中多种状态网络的不同辍学办法。此外,我们设计了一种方法,以结构化的辍学办法,我们叫作\ textbf himb{ProbDropBlock},这种办法从地貌图中丢弃毗连区块,其可能性来自常规特征特征显著值。我们发现,通过简单的时间安排战略,拟议的办法是将辍学模式性表现与基线相比不断改进,并在多种任务和模型上采用其他的辍学办法。特别是,我们用0.22美元对MNLI改进了ROBERTA的微调,并在图像网上培训ResNet50 0.20美元。