We introduce camouflaged data poisoning attacks, a new attack vector that arises in the context of machine unlearning and other settings when model retraining may be induced. An adversary first adds a few carefully crafted points to the training dataset such that the impact on the model's predictions is minimal. The adversary subsequently triggers a request to remove a subset of the introduced points at which point the attack is unleashed and the model's predictions are negatively affected. In particular, we consider clean-label targeted attacks (in which the goal is to cause the model to misclassify a specific test point) on datasets including CIFAR-10, Imagenette, and Imagewoof. This attack is realized by constructing camouflage datapoints that mask the effect of a poisoned dataset.
翻译:我们引入了伪装的数据中毒攻击,这是在机器不学习和其他情况下产生的一种新的攻击矢量,在可能引发模型再培训的情况下,一个对手首先在培训数据集中添加了几个精心设计的要点,这样对模型预测的影响是最小的。随后,对手请求删除启动攻击和模型预测受到负面影响的一组引入点。特别是,我们考虑在包括CIFAR-10、图像nette和图像woof在内的数据集中进行清洁标签定向攻击(其目的是导致模型错误地划分特定测试点 ) 。这一攻击是通过建立掩盖有毒数据集影响的伪装数据点来实现的。