Representation learning has significantly been developed with the advance of contrastive learning methods. Most of those methods have benefited from various data augmentations that are carefully designated to maintain their identities so that the images transformed from the same instance can still be retrieved. However, those carefully designed transformations limited us to further explore the novel patterns exposed by other transformations. Meanwhile, as found in our experiments, the strong augmentations distorted the images' structures, resulting in difficult retrieval. Thus, we propose a general framework called Contrastive Learning with Stronger Augmentations~(CLSA) to complement current contrastive learning approaches. Here, the distribution divergence between the weakly and strongly augmented images over the representation bank is adopted to supervise the retrieval of strongly augmented queries from a pool of instances. Experiments on the ImageNet dataset and downstream datasets showed the information from the strongly augmented images can significantly boost the performance. For example, CLSA achieves top-1 accuracy of 76.2% on ImageNet with a standard ResNet-50 architecture with a single-layer classifier fine-tuned, which is almost the same level as 76.5% of supervised results. The code and pre-trained models are available in https://github.com/maple-research-lab/CLSA.
翻译:随着对比式学习方法的推进,代表制学习有了显著的发展,这些方法大多得益于各种数据增强,这些增强数据被仔细指定以保持其身份,从而使从同一实例转换的图像仍然可以检索。然而,这些精心设计的转换使我们无法进一步探索其他变异所暴露的新模式。与此同时,正如我们在实验中发现的那样,强大的增强作用扭曲了图像的结构,导致难以检索。因此,我们提议了一个称为“以强力增益进行对比学习~(CLSA)”的一般性框架,以补充目前的对比式学习方法。在这里,代表制库上弱和强力增强图像的分布差异被采用,以监督从一组实例中检索强力扩充的查询。对图像网络数据集和下游数据集的实验显示,从大力扩充图像中获得的信息可以大大提升性能。例如,CLSA在图像网络上实现了76.2%的上端-一级精确度,一个标准的ResNet-50结构经过微调,几乎与监督结果的76.5%的水平相同。代码和预先训练的模型可在 https://CLABLA.