Deep neural networks are susceptible to poisoning attacks by purposely polluted training data with specific triggers. As existing episodes mainly focused on attack success rate with patch-based samples, defense algorithms can easily detect these poisoning samples. We propose DeepPoison, a novel adversarial network of one generator and two discriminators, to address this problem. Specifically, the generator automatically extracts the target class' hidden features and embeds them into benign training samples. One discriminator controls the ratio of the poisoning perturbation. The other discriminator works as the target model to testify the poisoning effects. The novelty of DeepPoison lies in that the generated poisoned training samples are indistinguishable from the benign ones by both defensive methods and manual visual inspection, and even benign test samples can achieve the attack. Extensive experiments have shown that DeepPoison can achieve a state-of-the-art attack success rate, as high as 91.74%, with only 7% poisoned samples on publicly available datasets LFW and CASIA. Furthermore, we have experimented with high-performance defense algorithms such as autodecoder defense and DBSCAN cluster detection and showed the resilience of DeepPoison.
翻译:深心神经网络容易被故意污染的培训数据以特定触发物导致中毒。 由于现有事件主要侧重于使用基于补丁的样本进行攻击成功率, 防御算法可以很容易地检测这些中毒样本。 我们提议了Deep Poison, 一个由一台发电机和两个导体组成的新型对抗网络, 来解决这个问题。 具体来说, 发电机自动提取目标类的隐藏特征, 并将它们嵌入无害的培训样本中。 一个歧视者控制着中毒扰动的比例 。 另一个歧视者作为证明中毒效应的目标模型发挥作用 。 深水系统的新颖之处在于, 生成的有毒培训样本通过防御方法和手动视觉检查, 甚至良性测试样本都无法与良性样本区分开来进行攻击。 广泛的实验表明, Deep Poison 能够达到一个最先进的攻击成功率, 高达91.74%, 公众可提供的数据集LFW和CSIA中只有7%的中毒样本。 此外, 我们实验了高性防御算法, 如自动分解器防御和DBSAN集探测, 显示深海观测的复原能力。