We question the current evaluation practice on diffusion-based purification methods. Diffusion-based purification methods aim to remove adversarial effects from an input data point at test time. The approach gains increasing attention as an alternative to adversarial training due to the disentangling between training and testing. Well-known white-box attacks are often employed to measure the robustness of the purification. However, it is unknown whether these attacks are the most effective for the diffusion-based purification since the attacks are often tailored for adversarial training. We analyze the current practices and provide a new guideline for measuring the robustness of purification methods against adversarial attacks. Based on our analysis, we further propose a new purification strategy showing competitive results against the state-of-the-art adversarial training approaches.
翻译:我们质疑目前关于基于扩散的净化方法的评价做法。基于传播的净化方法旨在从试验时的输入数据点消除对抗性效应。由于培训和测试脱钩,这种方法作为对抗性培训的替代方法越来越受到重视。众所周知的白箱袭击常常被用来测量净化的稳健性。然而,这些袭击是否对于基于扩散的净化最为有效尚不清楚,因为袭击往往是为对抗性培训而设计的。我们分析了当前的做法,并为衡量针对敌对性袭击的净化方法的稳健性提供了新的指南。我们根据我们的分析,进一步提出了一项新的净化战略,表明与最先进的对抗性培训方法相比,具有竞争性的结果。</s>