We present a method for provably defending any pretrained image classifier against $\ell_p$ adversarial attacks. This method, for instance, allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be $\ell_p$-robust to adversarial examples, without modifying the pretrained classifier. Our approach applies to both the white-box and the black-box settings of the pretrained classifier. We refer to this defense as denoised smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our approach to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found at: https://github.com/microsoft/denoised-smoothing.
翻译:例如,这种方法允许公众视像 API 供应商和用户无缝地将经过预先训练的非野蛮分类服务转换成可以想象的稳健的分类。我们先将一个经过专门训练的解调器应用到任何现成的图像分类中,然后使用随机化的平滑法,从而有效地建立了一个新的分类器,保证该分类器在不修改预先训练的分类器的情况下是$@p$-robust 用于对抗性分类。我们的方法既适用于白箱,也适用于经过预先训练的分类器的黑箱设置。我们称这种防御是解调的平滑,我们通过在图像网络和CIFAR-10上进行的广泛实验来展示其有效性。最后,我们用我们的方法来保护Azure、Google、AWS和ClarifAI 图像分类APIs。我们复制该文件中所有实验的代码可以在以下网址找到:https://github.com/microcroft/denoised-smothing。