We perform a comprehensive study on the performance of derivative free optimization (DFO) algorithms for the generation of targeted black-box adversarial attacks on Deep Neural Network (DNN) classifiers assuming the perturbation energy is bounded by an $\ell_\infty$ constraint and the number of queries to the network is limited. This paper considers four pre-existing state-of-the-art DFO-based algorithms along with the introduction of a new algorithm built on BOBYQA, a model-based DFO method. We compare these algorithms in a variety of settings according to the fraction of images that they successfully misclassify given a maximum number of queries to the DNN. The experiments disclose how the likelihood of finding an adversarial example depends on both the algorithm used and the setting of the attack; algorithms limiting the search of adversarial example to the vertices of the $\ell^\infty$ constraint work particularly well without structural defenses, while the presented BOBYQA based algorithm works better for especially small perturbation energies. This variance in performance highlights the importance of new algorithms being compared to the state-of-the-art in a variety of settings, and the effectiveness of adversarial defenses being tested using as wide a range of algorithms as possible.
翻译:我们全面研究了在深神经网络(DNN)分类中产生定向黑盒对抗性攻击的衍生自由优化(DFO)算法(DFO)的性能,假设干扰神经网络(DNN)分类器的查询能量受 $\ell ⁇ infty$的限制,对网络的查询次数有限。本文考虑了四个先前存在的基于DFO的先进算法,以及基于BOBYQA(基于模型的DFO方法)的新算法的采用情况。我们根据这些算法的不同环境,根据他们成功错误分类给DNN(DNN)最多查询次数的图像的一小部分,对这些算法进行了比较。实验揭示了找到一个对抗性例子的可能性如何取决于所使用的算法和攻击的设置;将寻找对抗性例子限于$\ell ⁇ inftyty(DFOFO)的顶端法工作,特别是没有结构防御,而所提出的基于BOBYQA的算法则在各种扰动能方面效果特别小。这种性表现突出的是,新的算法作为在可能进行对抗性研究的防御的模型中进行测试的极限。