Discovering novel concepts from unlabelled data and in a continuous manner is an important desideratum of lifelong learners. In the literature such problems have been partially addressed under very restricted settings, where either access to labelled data is provided for discovering novel concepts (e.g., NCD) or learning occurs for a limited number of incremental steps (e.g., class-iNCD). In this work we challenge the status quo and propose a more challenging and practical learning paradigm called MSc-iNCD, where learning occurs continuously and unsupervisedly, while exploiting the rich priors from large-scale pre-trained models. To this end, we propose simple baselines that are not only resilient under longer learning scenarios, but are surprisingly strong when compared with sophisticated state-of-the-art methods. We conduct extensive empirical evaluation on a multitude of benchmarks and show the effectiveness of our proposed baselines, which significantly raises the bar.
翻译:发现来自未标记数据的新概念并以持续的方式进行学习是终身学习者的一个重要目标。在文献中,这类问题已在非常限制的设置下得到了部分解决,其中提供有标记数据来发现新概念(如NCD)或学习仅在有限数量的增量步骤内进行(如class-iNCD)。在这项工作中,我们挑战了现状,提出了一种更具挑战性和实用性的学习范式,称为MSc-iNCD,其中学习连续且无监督,同时利用大规模预训练模型的丰富先验知识。为此,我们提出了简单的基线模型,不仅在更长的学习情形下具有抗干扰性,而且与先进的现有方法相比,表现出惊人的强大能力。我们在多个基准测试上进行了广泛的实证评估,并展示了提出的基线模型的有效性,从而极大地提高了基准。