Model pre-training is a cornerstone of modern visual recognition systems. Although fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent studies suggest that large-scale weakly supervised pre-training can outperform fully supervised approaches. This paper revisits weakly-supervised pre-training of models using hashtag supervision with modern versions of residual networks and the largest-ever dataset of images and corresponding hashtags. We study the performance of the resulting models in various transfer-learning settings including zero-shot transfer. We also compare our models with those obtained via large-scale self-supervised learning. We find our weakly-supervised models to be very competitive across all settings, and find they substantially outperform their self-supervised counterparts. We also include an investigation into whether our models learned potentially troubling associations or stereotypes. Overall, our results provide a compelling argument for the use of weakly supervised learning in the development of visual recognition systems. Our models, Supervised Weakly through hashtAGs (SWAG), are available publicly.
翻译:模型预培训是现代视觉识别系统的基石。尽管像图像网这样的数据集的全面监管前培训仍然是实际标准,但最近的研究表明,大规模受监管薄弱的培训前培训能够超过完全监督的方法。本文再次用现代版本的残余网络和最大的图像数据集及相应的标签,对模型进行监管薄弱的预培训。我们研究了由此产生的模型在包括零发传输在内的各种传输-学习环境中的性能。我们还比较了我们的模型和通过大规模自我监督学习获得的模型。我们发现,我们受到监管薄弱的模型在所有环境中都非常具有竞争力,并发现它们大大超越了自己监管的对应方。我们还包括调查我们的模式是否学到了潜在的麻烦联系或陈规定型观念。总体而言,我们的结果为在开发视觉识别系统时使用监管薄弱的学习提供了令人信服的论据。我们的各种模型,即通过HashtAGs(SWAG)系统(SWAGAG)系统(SWAGAGAG)系统(SWAGAGAG),可以公开获得。