Humans possess an innate ability to identify and differentiate instances that they are not familiar with, by leveraging and adapting the knowledge that they have acquired so far. Importantly, they achieve this without deteriorating the performance on their earlier learning. Inspired by this, we identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetting, which tasks a machine learning model to incrementally discover novel categories of instances from unlabeled data, while maintaining its performance on the previously seen categories. We propose 1) a method to generate pseudo-latent representations which act as a proxy for (no longer available) labeled data, thereby alleviating forgetting, 2) a mutual-information based regularizer which enhances unsupervised discovery of novel classes, and 3) a simple Known Class Identifier which aids generalized inference when the testing data contains instances form both seen and unseen categories. We introduce experimental protocols based on CIFAR-10, CIFAR-100 and ImageNet-1000 to measure the trade-off between knowledge retention and novel class discovery. Our extensive evaluations reveal that existing models catastrophically forget previously seen categories while identifying novel categories, while our method is able to effectively balance between the competing objectives. We hope our work will attract further research into this newly identified pragmatic problem setting.
翻译:人类拥有一种本能,通过利用和调整他们迄今获得的知识,发现和区分他们不熟悉的事例。重要的是,他们在不降低早期学习成绩的情况下实现了这一点。受此启发,我们发现并制定了一个新的、务实的NCDwF:NCDwF:Nvel Sul Discovery问题设置,不忘新发现类,要求一种机器学习模式逐步发现从未贴标签数据中发现的新型案例类别,同时保持其在以往所见类别上的性能。我们提议:(1) 一种生成假冒陈词法的方法,作为标签数据(不再有)的代用,从而减轻遗忘;(2) 一种基于相互信息的正规化工具,加强不受监督地发现新类,以及(3) 当测试数据包含既可见又看不见的类别时,一个简单的已知分类识别工具,帮助普遍推论。我们采用了基于CIFAR-10、CIFAR-100和图像Net-1000的实验协议,以衡量知识留存和新类发现之间的利弊。我们的广泛评价表明,现有的模型在确定新类别时会灾难性地忘记以前的分类,同时确定新的类别,我们的方法将能够有效地平衡相互竞争的目标。