We present a simple self-training method that achieves 87.4% top-1 accuracy on ImageNet, which is 1.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 16.6% to 74.2%, reduces ImageNet-C mean corruption error from 45.7 to 31.2, and reduces ImageNet-P mean flip rate from 27.8 to 16.1. To achieve this result, we first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as good as possible. But during the learning of the student, we inject noise such as data augmentation, dropout, stochastic depth to the student so that the noised student is forced to learn harder from the pseudo labels.
翻译:我们展示了一个简单的自我培训方法,在图像网上实现了87.4%的顶级-1精确度,比最先进的模型高1.0%,该模型需要3.5B贴有微弱标签的Instagram图像。在稳健度测试组中,它将图像网A顶级-1的精确度从16.6%提高到74.2%,将图像网-C的腐败误差从45.7%降至31.2%,并将图像网-P的平均翻转率从27.8%降至16.1%。为了实现这一结果,我们首先在标签图像网图像上培训了一个高效的网络模型,并将其作为教师在300M无标签的图像上制作假标签。然后,我们将一个更大的高效网络作为学生模型,在标签和假标签图像的组合上,我们把学生放回去,从而将这一过程推向了方向。在假标签的生成过程中,教师没有被切除,因此假标签尽可能好。但是在学生学习时,我们先给学生注入了诸如数据增强、退出、深度等的噪音,让学生更难从假标签中学习。