In this paper, we propose a novel guided diffusion purification approach to provide a strong defense against adversarial attacks. Our model achieves 89.62% robust accuracy under PGD-L_inf attack (eps = 8/255) on the CIFAR-10 dataset. We first explore the essential correlations between unguided diffusion models and randomized smoothing, enabling us to apply the models to certified robustness. The empirical results show that our models outperform randomized smoothing by 5% when the certified L2 radius r is larger than 0.5.
翻译:在本文中,我们提出一种新的引导扩散净化方法,为抵御对抗性攻击提供有力的防御。我们的模型在CPGD-L_inf攻击(eps = 8/255)下,在CIFAR-10数据集中实现了89.62%的稳健精确度。我们首先探索了非引导扩散模型和随机平滑之间的基本关联,使我们能够将这些模型应用于经认证的稳健性。经验结果显示,当经认证的L2半径r大于0.5时,我们的模型的性能超过了5%的随机平滑率。