Deep learning models are vulnerable to adversarial examples, which can fool a target classifier by imposing imperceptible perturbations onto natural examples. In this work, we consider the practical and challenging decision-based black-box adversarial setting, where the attacker can only acquire the final classification labels by querying the target model without access to the model's details. Under this setting, existing works often rely on heuristics and exhibit unsatisfactory performance. To better understand the rationality of these heuristics and the limitations of existing methods, we propose to automatically discover decision-based adversarial attack algorithms. In our approach, we construct a search space using basic mathematical operations as building blocks and develop a random search algorithm to efficiently explore this space by incorporating several pruning techniques and intuitive priors inspired by program synthesis works. Although we use a small and fast model to efficiently evaluate attack algorithms during the search, extensive experiments demonstrate that the discovered algorithms are simple yet query-efficient when transferred to larger normal and defensive models on the CIFAR-10 and ImageNet datasets. They achieve comparable or better performance than the state-of-the-art decision-based attack methods consistently.
翻译:深层次的学习模式很容易受到对抗性的例子的伤害,这些例子可以通过对自然例子施加不易察觉的干扰来欺骗目标分类。在这项工作中,我们考虑到实际和具有挑战性的基于决定的黑盒对抗性设置,攻击者只能通过查询目标模型而获得最后分类标签,而没有获得模型的细节。在这种背景下,现有的工程往往依赖疲劳学,表现不令人满意。为了更好地了解这些超自然学的合理性和现有方法的局限性,我们提议自动发现基于决定的对抗性攻击算法。在我们的方法中,我们用基本的数学操作作为建筑块来建造一个搜索空间,并开发一种随机搜索算法,以便有效地探索这一空间,方法是采用几种剪裁技术和受程序合成工作启发的直观性前程。虽然我们使用一个小而快速的模型来有效评估搜索过程中的攻击性算法,但广泛的实验表明,在将CIFAR-10和图像网络数据集转移到较大的正常和防御性模型时,所发现的算算法是简单但具有查询效率的。这些算法可以比州级或更好的表现得力。