In this paper we propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for some applications may not be possible. To address the problem of continual ViT training, we first propose gated class-attention to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Secondly, we propose a new method of feature drift compensation that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet and ImageNet100 demonstrate that our method outperforms existing exemplar-free state-of-the-art methods without the need to store any representative exemplars of past tasks.
翻译:在本文中,我们提出了一个新的方法,用于对 VIT 进行免一等级增量培训。 免一等持续学习的主要挑战是保持学习者的可塑性,而不会造成对先前学到的任务的灾难性遗忘。 这通常是通过示范性复放实现的,它有助于将先前的任务分类者重新校正到学习新任务时出现的特征漂移。 但是,示范性复播是以保留先前任务样本为代价的,而对于某些应用来说可能无法做到的。为了解决持续 VIT 培训的问题,我们首先建议采用门式升降级,以尽量减少最后VIT 变异器块中的漂移。这种基于面具的格调用于最后一个变异器块的课堂感应机制,并有力地调节对以往任务至关重要的重量。 其次,我们提出了一种新的特征漂移补偿方法,在学习新任务时会适应骨干中的漂移。 门级和级增级特征补偿的组合使得塑料化成为新的任务,同时限制忘记以前的任务。 在 CIFAR-100、 Tin-Image-plage Net 和图像网络 Ex形式上显示我们以往的方法。