对自我监督学习的后门攻击 (Backdoor Attacks on Self-Supervised Learning)

Large-scale unlabeled data has allowed recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (MoCo and BYOL) use an inductive bias that different augmentations (e.g. random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to backdoor attacks where an attacker poisons a part of the unlabeled data by adding a small trigger (known to the attacker) to the images. The model performance is good on clean test images but the attacker can manipulate the decision of the model by showing the trigger at test time. Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. Backdoor attacks are more practical in self-supervised learning since the unlabeled data is large and as a result, an inspection of the data to avoid the presence of poisoned data is prohibitive. We show that in our targeted attack, the attacker can produce many false positives for the target category by using the trigger at test time. We also propose a knowledge distillation based defense algorithm that succeeds in neutralizing the attack. Our code is available here: https://github.com/UMBCvision/SSL-Backdoor .

翻译：大型无标签数据使得在自我监督的学习方法方面最近能够取得进展,这些方法可以学习丰富的视觉表现。最先进的自我监督方法可以从图像(MoCo和BYOL)中学习演示(moCo和BYOL),使用一种感应偏差,即图像的不同放大(如随机作物)应产生类似的嵌入。我们显示,这些方法很容易受到后门攻击,因为攻击者在图像中添加一个小触发器(攻击者知道的),从而毒害了未标签数据的一部分。模型性能对清洁测试图像是好的,但攻击者可以通过在测试时间显示触发器来操纵模型的决定。在监督的学习和我们的知识的最好方面,我们广泛研究了后门攻击是第一个用来进行自我监督学习的。后门攻击在自我监督的学习中更加实用,因为未贴标签的数据很大,因此,对数据进行检查以避免有毒数据的存在是令人窒息的。我们显示,攻击者可以在定向攻击中为目标类别提供许多虚假的肯定数据,我们还可以在测试时用触发码/ABC 进行我们现有的智能智能智能智能智能智能智能智能。我们在这里提出一个智能智能智能智能智能智能智能的智能智能。。在测试中, 测试中, 测试中, 测试中,我们也可以提出一个智能智能智能智能智能智能智能智能智能智能智能智能。