The adversarial machine learning literature is largely partitioned into evasion attacks on testing data and poisoning attacks on training data. In this work, we show that adversarial examples, originally intended for attacking pre-trained models, are even more effective for data poisoning than recent methods designed specifically for poisoning. Our findings indicate that adversarial examples, when assigned the original label of their natural base image, cannot be used to train a classifier for natural images. Furthermore, when adversarial examples are assigned their adversarial class label, they are useful for training. This suggests that adversarial examples contain useful semantic content, just with the ``wrong'' labels (according to a network, but not a human). Our method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release, and we release a poisoned version of ImageNet, ImageNet-P, to encourage research into the strength of this form of data obfuscation.
翻译:对抗性机器学习文献大部分被分割成对测试数据进行规避攻击和对培训数据进行中毒攻击。在这项工作中,我们显示最初用来攻击训练前模型的对抗性例子比最近专门用来中毒的方法对数据中毒更有效。我们的研究结果表明,在分配自然基像的原始标签时,对抗性例子不能用来训练自然图像的分类师。此外,当对立性例子被分配为对抗性类标签时,它们可用于培训。这表明对抗性例子含有有用的语义内容,正如“wrong”标签(根据网络,而不是人类的标签)一样。我们的对抗性中毒方法比现有的安全释放数据集的中毒方法要有效得多,我们发行了一个有毒版本的图像网,即图像网-P,以鼓励研究这种数据形式的模糊力。