Class Incremental Learning (CIL) aims at learning a multi-class classifier in a phase-by-phase manner, in which only data of a subset of the classes are provided at each phase. Previous works mainly focus on mitigating forgetting in phases after the initial one. However, we find that improving CIL at its initial phase is also a promising direction. Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance. Motivated by this, we study the difference between a na\"ively-trained initial-phase model and the oracle model. Specifically, since one major difference between these two models is the number of training classes, we investigate how such difference affects the model representations. We find that, with fewer training classes, the data representations of each class lie in a long and narrow region; with more training classes, the representations of each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly, thus mimicking the model jointly trained with all classes (i.e., the oracle model). Our CwD is simple to implement and easy to plug into existing methods. Extensive experiments on various benchmark datasets show that CwD consistently and significantly improves the performance of existing state-of-the-art methods by around 1\% to 3\%. Code will be released.
翻译:类递增学习( CIL) 旨在分阶段学习一个多级分类器, 在每个阶段只提供分类模式的数据。 先前的工作主要侧重于在初始阶段之后的阶段减少忘记。 然而, 我们发现在初始阶段改进 CIL也是一个有希望的方向。 具体地说, 我们实验性地显示, 直接鼓励 CIL 学习者在初始阶段输出相似的表达方式, 因为对所有类进行联合培训的模型可以大大提升 CIL 的绩效。 受此观察的启发, 我们研究每个类经过专门训练的初始阶段模型和甲骨文模型之间的差异。 具体地说, 由于这两个模式之间的一个主要差异是培训班的数量, 我们调查这种差异如何影响模型的表述方式。 我们发现, 由于培训班数量较少, 每个类的数据表述方式都位于一个漫长和狭窄的区域; 随着更多的培训班, 每个类的表述方式更加一致。 我们提议, 类的分类和调( Cw) 将调整每个类的表述方式, 以更加统一的方式分散, 从而将简化 Calcalbal 3 的模型, 和所有现有的实验方法 都明显地展示。