Deep neural networks (DNNs) are vulnerable to adversarial noise. A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise, among which the input pre-processing methods are scalable and show great potential to safeguard DNNs. However, pre-processing methods may suffer from the robustness degradation effect, in which the defense reduces rather than improving the adversarial robustness of a target model in a white-box setting. A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model. To solve this problem, we investigate the influence of full adversarial examples which are crafted against the full model, and find they indeed have a positive impact on the robustness of defenses. Furthermore, we find that simply changing the adversarial training examples in pre-processing methods does not completely alleviate the robustness degradation effect. This is due to the adversarial risk of the pre-processed model being neglected, which is another cause of the robustness degradation effect. Motivated by above analyses, we propose a method called Joint Adversarial Training based Pre-processing (JATP) defense. Specifically, we formulate a feature similarity based adversarial risk for the pre-processing model by using full adversarial examples found in a feature space. Unlike standard adversarial training, we only update the pre-processing model, which prompts us to introduce a pixel-wise loss to improve its cross-model transferability. We then conduct a joint adversarial training on the pre-processing model to minimize this overall risk. Empirical results show that our method could effectively mitigate the robustness degradation effect across different target models in comparison to previous state-of-the-art approaches.


翻译:深度神经网络(DNNs)很容易受到对抗性噪音的影响。 已经提出了一系列对抗性防御技术,以减少对抗性噪音的干扰,其中输入前处理方法可以伸缩,并显示出保护DNNs的巨大潜力。然而,处理前方法可能受到稳性降解效应的影响,在这种效应中,国防会减少而不是改善目标模型在白箱环境中的对抗性稳健性。这种负面效应的一个潜在原因是,对抗性培训范例是静态的,独立于处理前模式。为了解决这一问题,我们调查了完全模型所设计的充分对抗性辩论性范例的影响,发现它们确实对防御的稳健性产生了积极影响。此外,我们发现仅仅改变预处理方法中的对抗性培训范例并不能完全减轻强性降解效应。这是由于预处理前模型被忽略的对抗性风险,这是造成稳性降解效应的另一个原因。 以上分析的结果是,我们提出了一种方法,即联合对基于全面模型的对抗性培训(JTP)前比较,确实对防御性产生了积极影响。我们找到了一种在全面对抗性模型中进行模拟性升级的风险。 我们找到了一种类似的模型,我们用全面的对抗性模型来改进了在空间前的升级前的升级的模型。 。我们用一个不同的模型来展示性模型来展示性升级的模型来展示性升级性升级的模型,我们用一种方法,我们用一个类似的模型来展示性升级性升级式的模型来显示。 。我们用一种不同的模型来展示性模型来展示性升级式的模型来显示一种不同的格式。 。

2
下载
关闭预览

相关内容

专知会员服务
44+阅读 · 2020年10月31日
【Google】平滑对抗训练,Smooth Adversarial Training
专知会员服务
48+阅读 · 2020年7月4日
鲁棒机器学习相关文献集
专知
8+阅读 · 2019年8月18日
Hierarchically Structured Meta-learning
CreateAMind
26+阅读 · 2019年5月22日
【SIGIR2018】五篇对抗训练文章
专知
12+阅读 · 2018年7月9日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
Adversarial Variational Bayes: Unifying VAE and GAN 代码
CreateAMind
7+阅读 · 2017年10月4日
【学习】Hierarchical Softmax
机器学习研究会
4+阅读 · 2017年8月6日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Feature Denoising for Improving Adversarial Robustness
Arxiv
15+阅读 · 2018年12月9日
VIP会员
Top
微信扫码咨询专知VIP会员