We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes. We present a new approach called AutoNovel to address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labelled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use ranking statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data. Moreover, we propose a method to estimate the number of classes for the case where the number of new categories is not known a priori. We evaluate AutoNovel on standard classification benchmarks and substantially outperform current methods for novel category discovery. In addition, we also show that AutoNovel can be used for fully unsupervised image clustering, achieving promising results.
翻译:我们解决了在图像收集中发现新类的问题,并给出了其他类的标签实例。我们提出了一个名为“AutoNovvel”的新方法,通过将三个想法结合起来来解决这一问题:(1) 我们建议,使用标签数据对图像表示进行示意图的常见方法只能带来一种不必要的偏差,而通过使用自我监督的学习,从头到尾对标记和无标签数据的组合进行代表培训,可以避免这种情况;(2) 我们使用排序统计数据,将模型对标签类的了解转移到未标签图像分组的问题;(3) 我们通过优化数据标签和未标签子集的联合目标功能,改进标签数据的监管分类,以及将未标签数据分组,来培训数据表示方式。 此外,我们建议了一种方法,用于估计新类别数目不为先前所知的案例的班数。我们评估标准分类基准的自动Novel,并大大超出当前新类别发现方法的完美。 此外,我们还表明,AutoNovel可以用于完全不受监控的图像组合,实现有希望的结果。