Diffusion models have been recently employed to improve certified robustness through the process of denoising. However, the theoretical understanding of why diffusion models are able to improve the certified robustness is still lacking, preventing from further improvement. In this study, we close this gap by analyzing the fundamental properties of diffusion models and establishing the conditions under which they can enhance certified robustness. This deeper understanding allows us to propose a new method DensePure, designed to improve the certified robustness of a pretrained model (i.e. classifier). Given an (adversarial) input, DensePure consists of multiple runs of denoising via the reverse process of the diffusion model (with different random seeds) to get multiple reversed samples, which are then passed through the classifier, followed by majority voting of inferred labels to make the final prediction. This design of using multiple runs of denoising is informed by our theoretical analysis of the conditional distribution of the reversed sample. Specifically, when the data density of a clean sample is high, its conditional density under the reverse process in a diffusion model is also high; thus sampling from the latter conditional distribution can purify the adversarial example and return the corresponding clean sample with a high probability. By using the highest density point in the conditional distribution as the reversed sample, we identify the robust region of a given instance under the diffusion model's reverse process. We show that this robust region is a union of multiple convex sets, and is potentially much larger than the robust regions identified in previous works. In practice, DensePure can approximate the label of the high density region in the conditional distribution so that it can enhance certified robustness.
翻译:最近运用了扩散模型来通过分解过程提高经认证的稳健性。然而,对于为什么传播模型能够提高经认证的稳健性,理论上仍缺乏理解,因此无法进一步改进。在本研究中,我们通过分析传播模型的基本特性和确定这些模型能够提高经认证的稳健性的条件来弥补这一差距。这种更深的理解使我们得以提出一种新的方法,即 " ensePure " (ensePure),目的是提高事先经过培训的模型(即分类者)的经认证的稳健性。鉴于一种(对抗性)输入,DensePure包含通过扩散模型的反向进程(有不同的随机标签种子)获得多次解析,以获得多次反向的样本,然后通过分析器传递,然后以多数投票推算标签来进行最后预测。这种使用多度分解方法的设计参考了我们对转过来样品的有条件分布进行的理论分析。具体地说,当清洁样品的数据密度很高时,在扩散模型的反向下,其有条件的密度也是很高的;因此,从后一种有条件分发模型中进行净化的准确的准确性,在经过一次的标值中,我们以高的递化的样品的递化的递化的递解率显示的该区域。