Large-scale unlabeled data has spurred recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (e.g., MoCo, BYOL, MSF) use an inductive bias that random augmentations (e.g., random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to backdoor attacks - where an attacker poisons a small part of the unlabeled data by adding a trigger (image patch chosen by the attacker) to the images. The model performance is good on clean test images, but the attacker can manipulate the decision of the model by showing the trigger at test time. Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. Backdoor attacks are more practical in self-supervised learning, since the use of large unlabeled data makes data inspection to remove poisons prohibitive. We show that in our targeted attack, the attacker can produce many false positives for the target category by using the trigger at test time. We also propose a knowledge distillation based defense algorithm that succeeds in neutralizing the attack. Our code is available here: https://github.com/UMBCvision/SSL-Backdoor .
翻译:大型无标签数据促使自监督的学习方法最近取得了进展,这些自监督的学习方法学会了丰富的视觉表现。 最先进的自监督方法从图像中(例如Moco、BYOL、MSF)学习演示演示,使用一种感官偏差,即随机放大图像(例如随机作物)应该产生类似的嵌入。 我们显示,这些方法很容易受到后门攻击的伤害,攻击者通过添加触发器(攻击者选择的模版)毒害了未贴标签数据中的一小部分。 模型性能在清洁测试图像方面是好的,但攻击者可以通过在测试时间显示触发器来操纵模型的决定。 在监督的学习和我们的知识中,对后门攻击进行了广泛研究,以便进行自我监督的学习。 在自我监督的学习中,后门攻击更加实际可行,因为使用大型无标签的数据检查可以消除毒药。 我们的定向攻击中显示,攻击者也可以通过在目标类别中显示一个错误的中标码来决定模型。 我们的进攻者也可以在目标类别中生成一个不真实的触发器。