Adversarial robustness is considered as a required property of deep neural networks. In this study, we discover that adversarially trained models might have significantly different characteristics in terms of margin and smoothness, even they show similar robustness. Inspired by the observation, we investigate the effect of different regularizers and discover the negative effect of the smoothness regularizer on maximizing the margin. Based on the analyses, we propose a new method called bridged adversarial training that mitigates the negative effect by bridging the gap between clean and adversarial examples. We provide theoretical and empirical evidence that the proposed method provides stable and better robustness, especially for large perturbations.
翻译:在这项研究中,我们发现,经过对抗性训练的模型在差值和顺畅性方面可能具有显著的不同特征,即使这些模型显示出类似的稳健性。在观察的启发下,我们调查了不同监管者的影响,并发现了顺畅调节器对最大幅度差值的负面影响。根据分析,我们提出了一种名为 " 搭桥对抗性培训 " 的新方法,该方法通过弥合清洁和对抗性实例之间的差距来减轻负面影响。我们提供了理论和经验证据,证明拟议方法提供了稳定和更好的稳健性,特别是对于大扰动而言。