Black-box adversarial attacks generate adversarial samples via iterative optimizations using repeated queries. Defending deep neural networks against such attacks has been challenging. In this paper, we propose an efficient Boundary Defense (BD) method which mitigates black-box attacks by exploiting the fact that the adversarial optimizations often need samples on the classification boundary. Our method detects the boundary samples as those with low classification confidence and adds white Gaussian noise to their logits. The method's impact on the deep network's classification accuracy is analyzed theoretically. Extensive experiments are conducted and the results show that the BD method can reliably defend against both soft and hard label black-box attacks. It outperforms a list of existing defense methods. For IMAGENET models, by adding zero-mean white Gaussian noise with standard deviation 0.1 to logits when the classification confidence is less than 0.3, the defense reduces the attack success rate to almost 0 while limiting the classification accuracy degradation to around 1 percent.
翻译:黑盒对抗性攻击通过反复查询的迭代优化生成对抗性样本。 保护深神经网络抵御这种攻击一直是一项挑战。 在本文中, 我们提出一个高效的边界防御( BD) 方法, 通过利用对抗性优化往往需要分类边界上的样本来减轻黑盒攻击。 我们的方法检测边界样本, 因为它的分类信任度低, 并在记录中添加白色高斯语噪音。 方法对深网络分类精确度的影响是理论上的分析。 进行了广泛的实验, 结果显示 BD 方法可以可靠地抵御软标签和硬标签黑盒攻击。 它比现有防御方法清单要强。 对于IMAGENET 模型, 在分类信任度小于0. 3时, 将标准偏差0. 1 的白高斯语添加到登录点, 将攻击成功率降低到近0, 同时将分类精确度降解率限制在1%左右。