为一致性培训提供不受监督的数据增强 (Unsupervised Data Augmentation for Consistency Training)

Despite much success, deep learning generally does not perform well with small labeled training sets. In these scenarios, data augmentation has shown much promise in alleviating the need for more labeled data, but it so far has mostly been applied in supervised settings and achieved limited gains. In this work, we propose to apply data augmentation to unlabeled data in a semi-supervised learning setting. Our method, named Unsupervised Data Augmentation or UDA, encourages the model predictions to be consistent between an unlabeled example and an augmented unlabeled example. Unlike previous methods that use random noise such as Gaussian noise or dropout noise, UDA has a small twist in that it makes use of harder and more realistic noise generated by state-of-the-art data augmentation methods. This small twist leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small. For example, on the IMDb text classification dataset, with only 20 labeled examples, UDA achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On standard semi-supervised learning benchmarks CIFAR-10 and SVHN, UDA outperforms all previous approaches and achieves an error rate of 2.7% on CIFAR-10 with only 4,000 examples and an error rate of 2.85% on SVHN with only 250 examples, nearly matching the performance of models trained on the full sets which are one or two orders of magnitude larger. UDA also works well on large-scale datasets such as ImageNet. When trained with 10% of the labeled set, UDA improves the top-1/top-5 accuracy from 55.1/77.3% to 68.7/88.5%. For the full ImageNet with 1.3M extra unlabeled data, UDA further pushes the performance from 78.3/94.4% to 79.0/94.5%.

翻译：尽管取得了许多成功,但深层次的学习通常在有标签的小规模培训中效果不佳。在这些情景中,数据增强在减少更多标签数据的需求方面显示了很大的希望,但迄今为止,数据增强大多用于受监管的设置,并取得了有限的收益。在这项工作中,我们提议在半监督的学习环境中对无标签数据应用数据增强。我们称为UDA的无监督数据增强或UDA的方法,鼓励模型预测在无标签的示例和强化的无标签实例之间保持一致。与以前使用随机噪音(如高斯噪音或退出噪音)的方法不同,UDA有一个小的扭曲,因为它使得使用由最新数据增强方法产生的更难和更加现实的噪音。即使标签的设置非常小,我们也提议在6种语言任务和3种愿景任务上应用数据增强数据。例如,在IMDB的文本分类数据集中,只有20个标签的示例,UDADA的错误率为4.20,在25 000个标签的完整图像中训练的状态模型,UDA值为25 000级的更高。在S-40级的模型中,S-85级的甚高层次数据中,在S-10-10级中,在标准中只的完整的成绩中只标前的成绩基准中,在S-10-10级中,在S-10和10级中只的高级数据中,在S-10级的高级数据中只全部的成绩中只比。