The recent advances in representation learning inspire us to take on the challenging problem of unsupervised image classification tasks in a principled way. We propose ContraCluster, an unsupervised image classification method that combines clustering with the power of contrastive self-supervised learning. ContraCluster consists of three stages: (1) contrastive self-supervised pre-training (CPT), (2) contrastive prototype sampling (CPS), and (3) prototype-based semi-supervised fine-tuning (PB-SFT). CPS can select highly accurate, categorically prototypical images in an embedding space learned by contrastive learning. We use sampled prototypes as noisy labeled data to perform semi-supervised fine-tuning (PB-SFT), leveraging small prototypes and large unlabeled data to further enhance the accuracy. We demonstrate empirically that ContraCluster achieves new state-of-the-art results for standard benchmark datasets including CIFAR-10, STL-10, and ImageNet-10. For example, ContraCluster achieves about 90.8% accuracy for CIFAR-10, which outperforms DAC (52.2%), IIC (61.7%), and SCAN (87.6%) by a large margin. Without any labels, ContraCluster can achieve a 90.8% accuracy that is comparable to 95.8% by the best supervised counterpart.
翻译:最近的表示学习进展激发了我们以原则性的方式解决无监督图像分类任务的挑战。我们提出ContraCluster,一种无监督图像分类方法,它将聚类和对比自监督学习方法相结合。ContraCluster包括三个阶段:(1)对比自监督预训练(CPT),(2)对比原型采样(CPS),(3)基于原型的半监督微调(PB-SFT)。CPS可以在对比学习获得的嵌入空间中选择高度准确、分类原型明确的图像。我们使用原型采样作为噪声标记数据来执行半监督微调(PB-SFT),利用小的原型和大量无标签数据进一步提高分类准确性。我们在常见基准数据集(包括CIFAR-10,STL-10和ImageNet-10)上展示了ContraCluster实验结果,获得了新的最优结果。例如,ContraCluster在CIFAR-10数据集上达到约90.8%的准确率,远高于DAC(52.2%),IIC(61.7%)和SCAN(87.6%)。即使没有标签,ContraCluster的准确率也可以达到90.8%,与最佳监督对应物的95.8%相媲美。