Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale. However, their efficacy is catastrophically reduced in a Continual Learning (CL) scenario where data is presented to the model sequentially. In this paper, we show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for CL by adding a predictor network that maps the current state of the representations to their past state. This enables us to devise a framework for Continual self-supervised visual representation Learning that (i) significantly improves the quality of the learned representations, (ii) is compatible with several state-of-the-art self-supervised objectives, and (iii) needs little to no hyperparameter tuning. We demonstrate the effectiveness of our approach empirically by training six popular self-supervised models in various CL settings.
翻译:自我监督模型显示,在对受监督的对应方进行大规模无标签数据离线培训时,自我监督模型比受监督的对应方产生可比或更好的视觉表现,但是,在连续学习假设中,其效力在连续学习假设中被灾难性地降低,数据按顺序向模型提供。在本文中,我们显示,自我监督的损失功能可以通过添加一个预测网络,将演示当前状况映射到过去状态,从而无缝地转化为CL的蒸馏机制。这使我们能够设计一个持续自我监督的视觉表现学习框架,以便(一) 显著提高学习的演示质量,(二) 符合一些最先进的自我监督目标,(三) 几乎不需要超参数调整。我们通过培训不同CL环境中的六种受监督模型,以经验方式展示了我们的方法的有效性。