Contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.005% of a dataset (e.g., just 150 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of less than 0.0001% of the dataset (e.g., just two out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.
翻译:类似 CLIP 的 CLIP 列车 的反向学习方法, 包括杂音和不精确的培训数据集。 这比手工标签数据集更便宜, 甚至可以提高分配的稳健性。 我们显示, 这种做法使后门和中毒袭击成为重大威胁。 通过毒害数据集中只有0.005%的数据集( 例如, 300万个概念说明数据集中只有150个图像), 我们可能会让模型通过覆盖一个小补丁来错误分类测试图像。 定向中毒袭击, 即模型错误分类特定测试输入, 加上对抗性理想标签, 更容易要求控制不到0. 010%的数据集( 例如, 300万个图像中只有两张) 。 我们的攻击引发了这样一个问题: 是否有必要对噪音和不精确的互联网废料进行培训 。