Large-scale unlabeled data has spurred recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (e.g., MoCo, BYOL, MSF) use an inductive bias that random augmentations (e.g., random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to backdoor attacks - where an attacker poisons a small part of the unlabeled data by adding a trigger (image patch chosen by the attacker) to the images. The model performance is good on clean test images, but the attacker can manipulate the decision of the model by showing the trigger at test time. Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. Backdoor attacks are more practical in self-supervised learning, since the use of large unlabeled data makes data inspection to remove poisons prohibitive. We show that in our targeted attack, the attacker can produce many false positives for the target category by using the trigger at test time. We also propose a defense method based on knowledge distillation that succeeds in neutralizing the attack. Our code is available here: https://github.com/UMBCvision/SSL-Backdoor .
翻译:大型无标签数据促使自监督的学习方法最近取得了进展,这些自监督的学习方法学会了丰富的视觉表现。 最先进的从图像(如Moco、BYOL、MSF)中学习演示的自监督方法(如Moco、BYOL、MSF)使用了一种演化偏差,即随机放大图像(如随机作物)应产生类似的嵌入。 我们显示,这些方法很容易受到后门攻击——攻击者通过在图像中添加触发器(攻击者选择的模版)毒害了未标记数据中的一小部分。 模型性能在清洁测试图像上是好的, 但攻击者可以通过测试时显示触发器来操纵模型的决定。 在监督的学习和我们的知识中,我们首先研究后门攻击,以便进行自我监督的学习。 后门攻击在自我监督的学习中更加实际可行,因为使用大型无标签数据检查可以消除毒药。 我们显示,在定向攻击中,攻击者也可以通过测试我们的目标类别, 提出许多现有的中标码。