We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios.
翻译:我们注重学习问题,而不会忘记从按顺序完成的多重任务中产生的重量,其中每项任务都是用一些新颖的或已经看到的课程来界定的。我们使用最近出版的超变异(HT)来处理这个问题,即基于变异器的超网络,直接从支持组中产生与任务有关的特殊CNN加权数。为了从连续的一系列任务中学习,我们提议在下一个任务中反复使用生成的重量作为HT的输入。这样,产生的CNN重量本身就代表了以前学到的任务,而HT受过更新这些加权数的训练,这样就可以在不忘过去的任务的情况下学习新任务。这个方法不同于通常依赖使用重新玩缓冲、重重整或任务独立的建筑变化的最持续学习算法。我们证明,我们提议的具有原型损失的连续超变异变法方法能够学习和保留关于过去任务的知识,以适应各种情景,包括从小型阵列中学习,以及任务、任务分级和阶级学习情景。