Unsupervised representation learning with contrastive learning achieved great success. This line of methods duplicate each training batch to construct contrastive pairs, making each training batch and its augmented version forwarded simultaneously and leading to additional computation. We propose a new jigsaw clustering pretext task in this paper, which only needs to forward each training batch itself, and reduces the training cost. Our method makes use of information from both intra- and inter-images, and outperforms previous single-batch based ones by a large margin. It is even comparable to the contrastive learning methods when only half of training batches are used. Our method indicates that multiple batches during training are not necessary, and opens the door for future research of single-batch unsupervised methods. Our models trained on ImageNet datasets achieve state-of-the-art results with linear classification, outperforming previous single-batch methods by 2.6%. Models transferred to COCO datasets outperform MoCo v2 by 0.4% with only half of the training batches. Our pretrained models outperform supervised ImageNet pretrained models on CIFAR-10 and CIFAR-100 datasets by 0.9% and 4.1% respectively. Code is available at https://github.com/Jia-Research-Lab/JigsawClustering
翻译:未经监督的代表学习与对比式学习取得了巨大成功。 这个方法线与每批培训方法重叠, 以构建对比式配对, 使每批培训及其扩大版本同时进行, 并导致额外的计算 。 我们在本文件中提出一个新的 jigsaw 群集托题任务, 只需将每批培训本身转发一次, 并降低培训成本 。 我们的方法使用来自内部和内部图像的信息, 并大大优于以往的单批数据 。 在只使用一半培训批次时, 这甚至与对比式学习方法相仿。 我们的方法表明, 培训期间不需要多批培训, 并且打开未来研究单批非监督方法的大门 。 我们所培训的图像网络数据集模型以线性分类方式取得最新结果, 比以往的单批方法高出2. 6% 。 模型被转移到COCO数据集, 超越了MCo2 v2. 0.4%, 仅使用一半的培训批次。 我们预先训练过的模型超越了在 CIFAR- 10 和 CIFAR- 10 和 CD- CD- 100 分别提供 0.9% 和 CD- CD- CD- AS/ CD- CD- AS/ 0. AS/ 0. 。