Despite the success of deep learning methods on instance segmentation, these models still suffer from catastrophic forgetting in continual learning scenarios. In this paper, our contributions for continual instance segmentation are threefold. First, we propose the Y-knowledge distillation (Y-KD), a knowledge distillation strategy that shares a common feature extractor between the teacher and student networks. As the teacher is also updated with new data in Y-KD, the increased plasticity results in new modules that are specialized on new classes. Second, our Y-KD approach is supported by a dynamic architecture method that grows new modules for each task and uses all of them for inference with a unique instance segmentation head, which significantly reduces forgetting. Third, we complete our approach by leveraging checkpoint averaging as a simple method to manually balance the trade-off between the performance on the various sets of classes, thus increasing the control over the model's behavior without any additional cost. These contributions are united in our model that we name the Dynamic Y-KD network. We perform extensive experiments on several single-step and multi-steps scenarios on Pascal-VOC, and we show that our approach outperforms previous methods both on past and new classes. For instance, compared to recent work, our method obtains +2.1% mAP on old classes in 15-1, +7.6% mAP on new classes in 19-1 and reaches 91.5% of the mAP obtained by joint-training on all classes in 15-5.
翻译:尽管在实例分割方面深层次的学习方法取得了成功,但这些模型仍然在不断学习的情景中被灾难性地遗忘。 在本文中,我们为持续实例分割所做的贡献是三重的。 首先,我们建议Y-Oct 蒸馏(Y-KD),这是一个知识蒸馏(Y-KD)战略,在教师和学生网络之间共享一个共同的特征提取器。由于教师还以Y-KD的新数据更新了模型,新模块中的塑料性能增加的结果是专门针对新类的新模块。第二,我们的Y-KD方法得到了动态结构方法的支持,该方法为每项任务开发了新模块,并使用所有这些模块与独特的实例分割头进行推断,从而大大减少了遗忘。第三,我们完成了我们的方法是,将Y-K-KD(Y-KD)技术的提炼(Y-KD)技术提炼(Y-KD)技术提炼(Y-KD)技术。我们利用所有这些动态结构方法,在Pascal-VOC-VOC(VOC)的多个单步和多步情景上进行了广泛的实验,大大减少了遗忘。 第三,我们用15-AP(MAP)方法完成了我们过去和过去1级的15-A7级的旧方法。</s>