Recently, self-supervised representation learning gives further development in multimedia technology. Most existing self-supervised learning methods are applicable to packaged data. However, when it comes to streamed data, they are suffering from a catastrophic forgetting problem, which is not studied extensively. In this paper, we make the first attempt to tackle the catastrophic forgetting problem in the mainstream self-supervised methods, i.e., contrastive learning methods. Specifically, we first develop a rehearsal-based framework combined with a novel sampling strategy and a self-supervised knowledge distillation to transfer information over time efficiently. Then, we propose an extra sample queue to help the network separate the feature representations of old and new data in the embedding space. Experimental results show that compared with the naive self-supervised baseline, which learns tasks one by one without taking any technique, we improve the image classification accuracy by 1.60% on CIFAR-100, 2.86% on ImageNet-Sub, and 1.29% on ImageNet-Full under 10 incremental steps setting. Our code will be available at https://github.com/VDIGPKU/ContinualContrastiveLearning.
翻译:最近,自我监督的代表学习使多媒体技术得到进一步发展。大多数现有的自监督学习方法都适用于软件包数据。然而,在流数据方面,它们正在遭受灾难性的遗忘问题,对此没有进行广泛研究。在本文中,我们第一次尝试解决主流自我监督方法中灾难性的遗忘问题,即对比式学习方法。具体地说,我们首先开发了一种基于排练的框架,加上一种新型抽样战略和自监督的知识蒸馏,以便有效传输信息。然后,我们提议增加一个样本队列,以帮助网络分离嵌入空间的旧数据和新数据的特征。实验结果显示,与天真的自我监督基线相比,在不采用任何技术的情况下逐个学习任务,我们在图像网-Sub-100上提高了1.60%的图像分类准确度,在图像网-Sub上提高了2.86%,在10个渐进步骤设置下将图像网-Full上提高了1.29%。我们的代码将在https://github.com/VDIGKU/ConpalialContrastaristrain上提供。