Adversarial defenses are naturally evaluated on their ability to tolerate adversarial attacks. To test defenses, diverse adversarial attacks are crafted, that are usually described in terms of their evading capability and the L0, L1, L2, and Linf norms. We question if the evading capability and L-norms are the most effective information to claim that defenses have been tested against a representative attack set. To this extent, we select image quality metrics from the state of the art and search correlations between image perturbation and detectability. We observe that computing L-norms alone is rarely the preferable solution. We observe a strong correlation between the identified metrics computed on an adversarial image and the output of a detector on such an image, to the extent that they can predict the response of a detector with approximately 0.94 accuracy. Further, we observe that metrics can classify attacks based on similar perturbations and similar detectability. This suggests a possible review of the approach to evaluate detectors, where additional metrics are included to assure that a representative attack dataset is selected.
翻译:反向防御自然地评估其容忍对抗性攻击的能力。为测试防御,设计了不同的对抗性攻击,通常用其逃避能力与L0、L1、L2和Linf规范来描述。我们质疑逃避能力和L-Norms是否是最有效的信息,以声称防御已经针对具有代表性的攻击组合进行了测试。在这方面,我们从最新状态中选择了图像质量指标,并搜索图像扰动和可探测性之间的相互关系。我们观察到,仅计算L-Norms就很少是更好的解决办法。我们观察到,根据对抗性图像计算出来的确定指标与这种图像的探测器输出之间有着很强的相互关系,只要它们能够预测探测器的反应,大约为0.94的准确度。我们还注意到,根据类似的扰动和类似的可探测性对攻击进行分类的指标可以对攻击进行分类。这表明,有可能对评估探测器的方法进行审查,其中包含额外的指标,以确保选择有代表性的攻击数据集。