Randomized smoothing (RS) is a well known certified defense against adversarial attacks, which creates a smoothed classifier by predicting the most likely class under random noise perturbations of inputs during inference. While initial work focused on robustness to $\ell_2$ norm perturbations using noise sampled from a Gaussian distribution, subsequent works have shown that different noise distributions can result in robustness to other $\ell_p$ norm bounds as well. In general, a specific noise distribution is optimal for defending against a given $\ell_p$ norm based attack. In this work, we aim to improve the certified adversarial robustness against multiple perturbation bounds simultaneously. Towards this, we firstly present a novel \textit{certification scheme}, that effectively combines the certificates obtained using different noise distributions to obtain optimal results against multiple perturbation bounds. We further propose a novel \textit{training noise distribution} along with a \textit{regularized training scheme} to improve the certification within both $\ell_1$ and $\ell_2$ perturbation norms simultaneously. Contrary to prior works, we compare the certified robustness of different training algorithms across the same natural (clean) accuracy, rather than across fixed noise levels used for training and certification. We also empirically invalidate the argument that training and certifying the classifier with the same amount of noise gives the best results. The proposed approach achieves improvements on the ACR (Average Certified Radius) metric across both $\ell_1$ and $\ell_2$ perturbation bounds.
翻译:多重扰动边界内的认证对抗稳健性
翻译后的摘要:
Randomized Smoothing(RS)是众所周知的对抗攻击认证防御,它通过在推理过程中对输入进行随机噪声扰动,预测最可能的分类来创建平滑分类器。虽然最初的工作集中在使用从高斯分布中采样的噪声来对$\ell_2$范数扰动进行防御,但随后的研究表明不同的噪声分布也可以导致对其他$\ell_p$范数边界的稳健性。通常,特定的噪声分布对于抵御给定的$\ell_p$范数攻击是最优的。在这项工作中,我们旨在同时改善对多个扰动边界的认证对抗稳健性。为此,我们首先提出了一种新的认证方案,该方案有效地组合了使用不同噪声分布获得的证书,以获得针对多个扰动边界的最佳结果。我们进一步提出了一种新的训练噪声分布以及一种规则化的训练方案,以同时提高$\ell_1$和$\ell_2$扰动范数内的认证。与先前的工作相反,我们将不同训练算法的认证稳健性与相同的自然(纯净)准确度进行比较,而不是在固定噪声级别用于训练和认证之间进行比较。我们还经验性地证明了使用相同噪声量进行分类器的训练和认证会得到最佳结果的观点是无效的。所提出的方法在$\ell_1$和$\ell_2$扰动边界内的ACR(平均认证半径)指标上都取得了改进。