Novel class discovery (NCD) aims at learning a model that transfers the common knowledge from a class-disjoint labelled dataset to another unlabelled dataset and discovers new classes (clusters) within it. Many methods, as well as elaborate training pipelines and appropriate objectives, have been proposed and considerably boosted performance on NCD tasks. Despite all this, we find that the existing methods do not sufficiently take advantage of the essence of the NCD setting. To this end, in this paper, we propose to model both inter-class and intra-class constraints in NCD based on the symmetric Kullback-Leibler divergence (sKLD). Specifically, we propose an inter-class sKLD constraint to effectively exploit the disjoint relationship between labelled and unlabelled classes, enforcing the separability for different classes in the embedding space. In addition, we present an intra-class sKLD constraint to explicitly constrain the intra-relationship between a sample and its augmentations and ensure the stability of the training process at the same time. We conduct extensive experiments on the popular CIFAR10, CIFAR100 and ImageNet benchmarks and successfully demonstrate that our method can establish a new state of the art and can achieve significant performance improvements, e.g., 3.5%/3.7% clustering accuracy improvements on CIFAR100-50 dataset split under the task-aware/-agnostic evaluation protocol, over previous state-of-the-art methods. Code is available at https://github.com/FanZhichen/NCD-IIC.
翻译:摘要:新类别发现(NCD)旨在学习一个模型,将来自类不相交的标记数据集的共同知识转移到另一个未标记的数据集并在其内部发现新的类别(簇)。许多方法以及精心设计的训练流程和适当的目标已经被提出,并且显着提高了NCD任务的性能。尽管如此,我们发现现有方法并没有充分利用NCD设置的本质。为此,在本文中,我们提出基于对称Kullback-Leibler(KL)散度(sKLD)对NCD中的跨类别和内部约束进行建模。具体而言,我们提出了跨类sKLD约束,以有效利用标记和未标记类之间的不相交关系,在嵌入空间中强制不同类别之间的可分性。此外,我们提出了内部sKLD约束,以明确约束样本及其增强之间的内部关系,并同时确保训练过程的稳定性。我们在受欢迎的CIFAR10、CIFAR100和ImageNet测试基准上进行了广泛的实验,并成功地证明了我们的方法可以建立新的最新技术,并且可以实现显著的性能提高,例如,在CIFAR100-50数据集分割上,在任务感知/-不可知评估协议下,比之前的最新技术方法提高了3.5%/3.7%的聚类准确性。代码可在https://github.com/FanZhichen/NCD-IIC获得。