In this paper, we address Novel Class Discovery (NCD), the task of unveiling new classes in a set of unlabeled samples given a labeled dataset with known classes. We exploit the peculiarities of NCD to build a new framework, named Neighborhood Contrastive Learning (NCL), to learn discriminative representations that are important to clustering performance. Our contribution is twofold. First, we find that a feature extractor trained on the labeled set generates representations in which a generic query sample and its neighbors are likely to share the same class. We exploit this observation to retrieve and aggregate pseudo-positive pairs with contrastive learning, thus encouraging the model to learn more discriminative representations. Second, we notice that most of the instances are easily discriminated by the network, contributing less to the contrastive loss. To overcome this issue, we propose to generate hard negatives by mixing labeled and unlabeled samples in the feature space. We experimentally demonstrate that these two ingredients significantly contribute to clustering performance and lead our model to outperform state-of-the-art methods by a large margin (e.g., clustering accuracy +13% on CIFAR-100 and +8% on ImageNet).
翻译:在本文中,我们探讨了新分类发现(NCD),这是在一组未贴标签的样本中揭发新类别的任务,该类样本带有标签的已知类别。我们利用NCD的特殊性来建立一个新的框架,名为邻里竞争学习(NCL),学习对集群性能很重要的歧视性表现。我们的贡献是双重的。首先,我们发现在标签集上受过训练的特征提取器会产生一种表示器,其中通用查询样本及其邻居有可能与同一类。我们利用这一观察来检索和汇总具有对比性学习的假阳性配对,从而鼓励模型学习更具歧视性的表达方式。第二,我们注意到大多数情况很容易受到网络的歧视,对对比性损失的贡献较少。为了克服这一问题,我们提议通过在特征空间混合标签和无标签的样本产生硬负作用。我们实验性地证明,这两种因素对聚合性能有重大贡献,并导致我们的模型以大幅度(例如,集成精度+13%的图像-FAR-100和图像-100+%上)超越了常规方法。