Although the adoption rate of deep neural networks (DNNs) has tremendously increased in recent years, a solution for their vulnerability against adversarial examples has not yet been found. As a result, substantial research efforts are dedicated to fix this weakness, with many studies typically using a subset of source images to generate adversarial examples, treating every image in this subset as equal. We demonstrate that, in fact, not every source image is equally suited for this kind of assessment. To do so, we devise a large-scale model-to-model transferability scenario for which we meticulously analyze the properties of adversarial examples, generated from every suitable source image in ImageNet by making use of two of the most frequently deployed attacks. In this transferability scenario, which involves seven distinct DNN models, including the recently proposed vision transformers, we reveal that it is possible to have a difference of up to $12.5\%$ in model-to-model transferability success, $1.01$ in average $L_2$ perturbation, and $0.03$ ($8/225$) in average $L_{\infty}$ perturbation when $1,000$ source images are sampled randomly among all suitable candidates. We then take one of the first steps in evaluating the robustness of images used to create adversarial examples, proposing a number of simple but effective methods to identify unsuitable source images, thus making it possible to mitigate extreme cases in experimentation and support high-quality benchmarking.
翻译:尽管深神经网络的采纳率近年来大幅提高,但目前尚未找到针对对抗性实例的应对其脆弱性的解决方案。因此,大量研究努力致力于纠正这一弱点,许多研究通常使用一组源图像来生成对抗性实例,将每一组图像同等对待。我们表明,事实上,并非所有源图像都同样适合这种评估。为了这样做,我们设计了一个大规模模型到模型的可转让性设想方案,我们通过使用两种最频繁部署的攻击,仔细分析图像网中每种适当来源图像产生的对抗性实例的特性。在这种可转让性设想方案中,涉及七个不同的DNNN模型,包括最近提出的愿景变异器。我们发现,在模型到模型的可转让性成功率方面,有可能出现高达12.5美元的差异,平均为每2美元,在当时平均为10.3美元(8/225美元),在使用1 000美元的原始图像中,在提出一个有效的原始图像样本时,我们随机地确定了一个可能使用的可靠图像的可靠版本。