In this paper we show how to achieve state-of-the-art certified adversarial robustness to 2-norm bounded perturbations by relying exclusively on off-the-shelf pretrained models. To do so, we instantiate the denoised smoothing approach of Salman et al. by combining a pretrained denoising diffusion probabilistic model and a standard high-accuracy classifier. This allows us to certify 71% accuracy on ImageNet under adversarial perturbations constrained to be within a 2-norm of 0.5, an improvement of 14 percentage points over the prior certified SoTA using any approach, or an improvement of 30 percentage points over denoised smoothing. We obtain these results using only pretrained diffusion models and image classifiers, without requiring any fine tuning or retraining of model parameters.
翻译:在本文中,我们展示了如何通过完全依赖现成的未经培训的模型,实现最先进的经认证的对抗性强力,使2个受约束的扰动达到2个受约束状态。为此,我们通过将预先取消的传播概率模型和标准的高准确性分类器相结合,对萨勒曼等人的淡化平滑方法进行即时处理。这使我们能够在限制在2个规范范围内的对抗性扰动中,对图像网络71%的准确性进行认证,利用任何方法,比先前经认证的索塔提高14个百分点,或对经消化的平滑进行30个百分点的改进。我们仅使用事先经过培训的传播模型和图像分类器,而无需对模型参数进行任何微调或再培训,才能取得这些结果。