Image clustering is a particularly challenging computer vision task, which aims to generate annotations without human supervision. Recent advances focus on the use of self-supervised learning strategies in image clustering, by first learning valuable semantics and then clustering the image representations. These multiple-phase algorithms, however, increase the computational time and their final performance is reliant on the first stage. By extending the self-supervised approach, we propose a novel single-phase clustering method that simultaneously learns meaningful representations and assigns the corresponding annotations. This is achieved by integrating a discrete representation into the self-supervised paradigm through a classifier net. Specifically, the proposed clustering objective employs mutual information, and maximizes the dependency between the integrated discrete representation and a discrete probability distribution. The discrete probability distribution is derived though the self-supervised process by comparing the learnt latent representation with a set of trainable prototypes. To enhance the learning performance of the classifier, we jointly apply the mutual information across multi-crop views. Our empirical results show that the proposed framework outperforms state-of-the-art techniques with the average accuracy of 89.1% and 49.0%, respectively, on CIFAR-10 and CIFAR-100/20 datasets. Finally, the proposed method also demonstrates attractive robustness to parameter settings, making it ready to be applicable to other datasets.
翻译:计算机图像群集是一项特别具有挑战性的计算机图像群集任务,其目的是在没有人监督的情况下生成说明; 最近的进展侧重于在图像群集中使用自我监督的学习策略,先学习有价值的语义,然后将图像群集集成; 然而,这些多阶段算法增加了计算时间和最终性能取决于第一阶段。 通过推广自我监督的方法,我们建议一种新型的单一阶段群集方法,同时学习有意义的表达方式,并分配相应的说明。这是通过通过分类网将独立代表方式纳入自我监督的范式来实现的。具体地说,拟议的组合目标利用了相互信息,并最大限度地扩大了综合离散代表与离散概率分布之间的依赖性。通过将所学的潜在代表形式与一组可训练的原型进行比较,得出了自我监督过程的概率分布。为了提高分类者的学习性能,我们联合应用了跨多种作物群集观点的相互信息。我们的经验结果显示,拟议的框架优于状态,以89.1%的平均精确度、49.0/20 和具有吸引力的离散概率分布,最后,通过将所学潜在潜在潜在代表与一组的CIFAR-10号数据分别展示。