Adversarial patch attacks that craft the pixels in a confined region of the input images show their powerful attack effectiveness in physical environments even with noises or deformations. Existing certified defenses towards adversarial patch attacks work well on small images like MNIST and CIFAR-10 datasets, but achieve very poor certified accuracy on higher-resolution images like ImageNet. It is urgent to design both robust and effective defenses against such a practical and harmful attack in industry-level larger images. In this work, we propose the certified defense methodology that achieves high provable robustness for high-resolution images and largely improves the practicality for real adoption of the certified defense. The basic insight of our work is that the adversarial patch intends to leverage localized superficial important neurons (SIN) to manipulate the prediction results. Hence, we leverage the SIN-based DNN compression techniques to significantly improve the certified accuracy, by reducing the adversarial region searching overhead and filtering the prediction noises. Our experimental results show that the certified accuracy is increased from 36.3% (the state-of-the-art certified detection) to 60.4% on the ImageNet dataset, largely pushing the certified defenses for practical use.
翻译:在输入图像的封闭区域制造像素的Adversarial 补丁袭击显示,即使有噪音或变形,它们也会在物理环境中产生强大的攻击效果。现有的对对抗性补丁袭击的认证防御对MNIST和CIFAR-10数据集等小图像效果良好,但在图像网等高分辨率图像上却获得非常差的认证准确性。因此,我们迫切需要设计强大和有效的防御手段,防止在工业级更大图像中发生这种实际和有害的攻击。在这项工作中,我们提议经过认证的防御方法,在高分辨率图像方面达到高度可辨识的稳健性,并在很大程度上改进了真正采用经认证的防御的实用性。我们工作的基本见解是,对抗性防御机制打算利用局部的表面重要神经元(SIN)来操纵预测结果。因此,我们利用基于SIN的DNNN压缩技术来大幅提高经认证的准确性,办法是减少对顶部的对抗区搜索和过滤预测噪音。我们的实验结果表明,经认证的准确性从36.3%(状态认证的验证检测)提高到图像网络实际数据使用的60.4%。