Motivated by previous observations that the usually applied $L_p$ norms ($p=1,2,\infty$) do not capture the perceptual quality of adversarial examples in image classification, we propose to replace these norms with the structural similarity index (SSIM) measure, which was developed originally to measure the perceptual similarity of images. Through extensive experiments with adversarially trained classifiers for MNIST and CIFAR-10, we demonstrate that our SSIM-constrained adversarial attacks can break state-of-the-art adversarially trained classifiers and achieve similar or larger success rate than the elastic net attack, while consistently providing adversarial images of better perceptual quality. Utilizing SSIM to automatically identify and disallow adversarial images of low quality, we evaluate the performance of several defense schemes in a perceptually much more meaningful way than was done previously in the literature.
翻译:以往的观察发现,通常应用的$L_p$规范(p=1,2,\ infty$)并不反映图像分类中的对抗性实例的认知质量,我们提议用结构性相似指数(SSIM)措施取代这些规范,该措施最初是为测量图像的认知相似性而开发的。我们通过对MNIST和CIFAR-10的敌对性训练性分类师进行的广泛实验,证明我们的受控的SSIM对抗性攻击可以打破经过敌对性训练的最先进的对立性分类师,并取得类似或更大的成功率,同时始终提供更好的认知性强的对抗性图像。利用SSIM自动识别和禁止低质量的对抗性图像,我们以比文献中以往更有意义的方式评估若干防御计划的表现。