While adversarial training is considered as a standard defense method against adversarial attacks for image classifiers, adversarial purification, which purifies attacked images into clean images with a standalone purification model, has shown promises as an alternative defense method. Recently, an Energy-Based Model (EBM) trained with Markov-Chain Monte-Carlo (MCMC) has been highlighted as a purification model, where an attacked image is purified by running a long Markov-chain using the gradients of the EBM. Yet, the practicality of the adversarial purification using an EBM remains questionable because the number of MCMC steps required for such purification is too large. In this paper, we propose a novel adversarial purification method based on an EBM trained with Denoising Score-Matching (DSM). We show that an EBM trained with DSM can quickly purify attacked images within a few steps. We further introduce a simple yet effective randomized purification scheme that injects random noises into images before purification. This process screens the adversarial perturbations imposed on images by the random noises and brings the images to the regime where the EBM can denoise well. We show that our purification method is robust against various attacks and demonstrate its state-of-the-art performances.
翻译:对抗性培训被视为对图像分类者进行对抗性攻击的一种标准防御方法,而将攻击的图像用独立净化模型将攻击的图像净化成清洁图像的对抗性净化,则显示了作为一种替代防御方法的希望。最近,与Markov-Chain Monte-Carlo(MCMC)培训的能源基模型(EBM)被强调为净化模式,通过使用EBM梯度运行一个长的马尔科夫链来净化被攻击的图像。然而,使用EBM进行对抗性净化的实用性仍然值得怀疑,因为这种净化所需的MC步骤数量太庞大。在本文件中,我们提议一种新型的对抗性净化方法,其基础是借助于Denoising分级(DSM)培训的EBM(EBM)系统(EBM)培训的EBM(E)系统。我们表明,DSM(EBA)系统能够展示强势的自我净化方法。