The transferability of adversarial examples is a key issue in the security of deep neural networks. The possibility of an adversarial example crafted for a source model fooling another targeted model makes the threat of adversarial attacks more realistic. Measuring transferability is a crucial problem, but the Attack Success Rate alone does not provide a sound evaluation. This paper proposes a new methodology for evaluating transferability by putting distortion in a central position. This new tool shows that transferable attacks may perform far worse than a black box attack if the attacker randomly picks the source model. To address this issue, we propose a new selection mechanism, called FiT, which aims at choosing the best source model with only a few preliminary queries to the target. Our experimental results show that FiT is highly effective at selecting the best source model for multiple scenarios such as single-model attacks, ensemble-model attacks and multiple attacks (Code available at: https://github.com/t-maho/transferability_measure_fit).
翻译:对抗样本可迁移性是深度神经网络安全的关键问题。对于来源模型伪装的用于欺骗另一个目标模型的对抗示例的可能性使对抗攻击的威胁更加现实。评估可迁移性是一个关键问题,但仅仅通过攻击成功率并不能提供合理的评估。本文提出了一种新的方法来通过将畸变放在核心位置来评估可迁移性。这个新工具显示,如果攻击者随机选择源模型,可迁移的攻击可能表现得比黑盒攻击要差得多。为了解决这个问题,我们提出了一种新的选择机制,称为“适合”(FiT),它旨在在只有几个目标的预备的查询时选择最佳的源模型。我们的实验结果表明,适合在多种场景下,如单模型攻击、合成模型攻击和多次攻击方面都非常有效(代码可在以下网址找到: https://github.com/t-maho/transferability_measure_fit)。