Contrastive learning pre-trains an image encoder using a large amount of unlabeled data such that the image encoder can be used as a general-purpose feature extractor for various downstream tasks. In this work, we propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data, such that the downstream classifiers built based on the poisoned encoder for multiple target downstream tasks simultaneously classify attacker-chosen, arbitrary clean inputs as attacker-chosen, arbitrary classes. We formulate our data poisoning attack as a bilevel optimization problem, whose solution is the set of poisoning inputs; and we propose a contrastive-learning-tailored method to approximately solve it. Our evaluation on multiple datasets shows that PoisonedEncoder achieves high attack success rates while maintaining the testing accuracy of the downstream classifiers built upon the poisoned encoder for non-attacker-chosen inputs. We also evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses. Our results show that these defenses can decrease the attack success rate of PoisonedEncoder, but they also sacrifice the utility of the encoder or require a large clean pre-training dataset.
翻译:使用大量未贴标签的数据, 图像编码器可以用作各种下游任务的一般用途特征提取器。 在这项工作中, 我们提议用数据中毒编码器来进行数据中毒袭击, 以对比性学习。 特别是, 攻击者将中毒输入到未贴标签的训练前数据中, 这样下游分类器以有毒编码器为基础, 用于多个目标下游任务, 同时将攻击者选择的、 任意的清洁输入作为攻击者选择的分类。 我们还将我们的数据中毒攻击作为一种双级优化问题, 其解决方案是一组中毒投入; 我们提出一种对比性学习定制方法, 以大致解决这个问题。 我们对多个数据集的评估显示, 中毒 Encoder 取得了高袭击成功率, 同时保持了以有毒编码器为基础的下游分类器的测试准确性, 用于非攻击者选择、 任意分类。 我们还评估了五种针对中毒 Encencoder 的防御系统, 包括一个预处理、 三个在加工过程中、 3个定制的定制方法, 以及一个后期安全级的防御系统, 展示了这些大规模安全级的防御。