It is extensively studied that Deep Neural Networks (DNNs) are vulnerable to Adversarial Examples (AEs). With more and more advanced adversarial attack methods have been developed, a quantity of corresponding defense solutions were designed to enhance the robustness of DNN models. It has become a popularity to leverage data augmentation techniques to preprocess input samples before inference to remove adversarial perturbations. By obfuscating the gradients of DNN models, these approaches can defeat a considerable number of conventional attacks. Unfortunately, advanced gradient-based attack techniques (e.g., BPDA and EOT) were introduced to invalidate these preprocessing effects. In this paper, we present FenceBox, a comprehensive framework to defeat various kinds of adversarial attacks. FenceBox is equipped with 15 data augmentation methods from three different categories. We comprehensively evaluated that these methods can effectively mitigate various adversarial attacks. FenceBox also provides APIs for users to easily deploy the defense over their models in different modes: they can either select an arbitrary preprocessing method, or a combination of functions for a better robustness guarantee, even under advanced adversarial attacks. We open-source FenceBox, and expect it can be used as a standard toolkit to facilitate the research of adversarial attacks and defenses.
翻译:人们广泛研究深神经网络(DNN)很容易受到反向攻击的例子(AEs)的影响。由于开发了越来越多的先进的对抗性攻击方法(AEs),设计了一批相应的防御解决方案,以加强DNN模型的稳健性。在推断排除对抗性扰动之前,利用数据增强技术来预处理输入样本已成为一种受欢迎的做法。通过模糊DNN模型的梯度,这些方法可以击败相当数量的常规攻击。不幸的是,采用了先进的梯度攻击技术(例如BPDA和EOT)来取消这些预处理效果。在本文件中,我们介绍了FenceBox,这是击败各种对抗性攻击的综合框架。FenceBox装备了来自三个不同类别的15种数据增强方法。我们全面评估了这些方法能够有效减轻各种对抗性攻击。FenceBox还提供APIS,用户可以方便以不同模式对模型进行防御:它们可以选择任意的预处理方法,或者将功能组合起来,以更好地保证其稳健性攻击,即使是在先进的对抗性攻击的先进研究工具之下,我们也可以期待Fox的防御。