Cluster discrimination is an effective pretext task for unsupervised representation learning, which often consists of two phases: clustering and discrimination. Clustering is to assign each instance a pseudo label that will be used to learn representations in discrimination. The main challenge resides in clustering since prevalent clustering methods (e.g., k-means) have to run in a batch mode and there can be a trivial solution consisting of a dominating cluster. To address these challenges, we first investigate the objective of clustering-based representation learning. Based on this, we propose a novel clustering-based pretext task with online Constrained K-means (CoKe). Compared with the balanced clustering that each cluster has exactly the same size, we only constrain the minimal size of each cluster to flexibly capture the inherent data structure. More importantly, our online assignment method has a theoretical guarantee to approach the global optimum. By decoupling clustering and discrimination, CoKe can achieve competitive performance when optimizing with only a single view from each instance. Extensive experiments on ImageNet verify both the efficacy and efficiency of our proposal. Code will be released.
翻译:集群歧视是未经监督的代表学习的一个有效借口,通常由两个阶段组成:集群和歧视;集群是指为每个案例指定一个假标签,用来学习歧视中的代表性;主要挑战在于集群,因为普遍的集群方法(例如k- means)必须以批量方式运行,而且可能有一个由主导群体组成的微不足道的解决办法。为了应对这些挑战,我们首先调查集群代表性学习的目标。在此基础上,我们提议用在线Constraced K-points(CoKe)进行基于集群的新颖的借口任务。与每个集群都具有完全相同规模的均衡组合相比,我们只限制每个集群的最小规模,以便灵活地捕捉到固有的数据结构。更重要的是,我们的在线分配方法具有理论保证,可以接近全球最佳模式。通过将集群和歧视区分开来,CoKe可以在每种情况下只用单一的视角优化时实现竞争性业绩。对图像网络的广泛实验将验证我们提案的效能和效率。