Adversarial attacks significantly threaten the robustness of deep neural networks (DNNs). Despite the multiple defensive methods employed, they are nevertheless vulnerable to poison attacks, where attackers meddle with the initial training data. In order to defend DNNs against such adversarial attacks, this work proposes a novel method that combines the defensive distillation mechanism with a denoising autoencoder (DAE). This technique tries to lower the sensitivity of the distilled model to poison attacks by spotting and reconstructing poisonous adversarial inputs in the training data. We added carefully created adversarial samples to the initial training data to assess the proposed method's performance. Our experimental findings demonstrate that our method successfully identified and reconstructed the poisonous inputs while also considering enhancing the DNN's resilience. The proposed approach provides a potent and robust defense mechanism for DNNs in various applications where data poisoning attacks are a concern. Thus, the defensive distillation technique's limitation posed by poisonous adversarial attacks is overcome.
翻译:对抗攻击对深度神经网络(DNNs)的鲁棒性构成重大威胁。尽管采用了多种防御方法,但它们仍然容易受到攻击者篡改初始训练数据的毒性攻击。为了防御这样的对抗攻击,本文提出了一种新颖的方法,将防御蒸馏机制与去噪自编码器(DAE)相结合。该技术试图通过检测和重构训练数据中的毒性对抗性输入,降低蒸馏模型对毒性攻击敏感性。我们向初始训练数据中添加了精心创建的对抗样本,以评估所提出方法的性能。我们的实验结果表明,我们的方法成功地识别并重建了有毒输入,同时还考虑增强DNN的韧性。所提出的方法为各种应用程序中DNN的数据投毒攻击问题提供了强大且健壮的防御机制。因此,所述防御蒸馏技术面临的毒性攻击的限制被克服了。