A recent source of concern for the security of neural networks is the emergence of clean-label dataset poisoning attacks, wherein correctly labeled poison samples are injected into the training dataset. While these poison samples look legitimate to the human observer, they contain malicious characteristics that trigger a targeted misclassification during inference. We propose a scalable and transferable clean-label poisoning attack against transfer learning, which creates poison images with their center close to the target image in the feature space. Our attack, Bullseye Polytope, improves the attack success rate of the current state-of-the-art by 26.75% in end-to-end transfer learning, while increasing attack speed by a factor of 12. We further extend Bullseye Polytope to a more practical attack model by including multiple images of the same object (e.g., from different angles) when crafting the poison samples. We demonstrate that this extension improves attack transferability by over 16% to unseen images (of the same object) without using extra poison samples.
翻译:最近对神经网络安全的担忧来源于清洁标签中毒袭击的出现,其中正确标签的毒物样本被注入培训数据集。这些毒物样本在人类观察者看来是合法的,但含有恶意特征,在推断过程中引发有目标的分类错误。我们建议对转移学习进行可扩缩和可转移的清洁标签中毒袭击,这种袭击产生毒物图像,其中心靠近地貌空间的目标图像。我们的攻击,即红心聚合体,在端到端转移学习中,将当前最新水平的毒物样本的成功率提高26.75%,同时将攻击速度提高12倍。我们进一步扩展红心聚合的打击模式,在绘制毒物样品时,将同一物体的多个图像(例如,从不同角度)包括在内。我们证明,这一扩展使攻击的可转移性提高了16%以上,(同一对象的)未使用额外的毒物样品,将(同一物体的)图像转移到看不见的图像。