The discovery of adversarial examples revealed one of the most basic vulnerabilities of deep neural networks. Among the variety of techniques introduced to tackle this inherent weakness, adversarial training was shown to be the most common and efficient strategy to achieve robustness. It is usually done by balancing the robust and natural losses. In this work, we aim to achieve better trade-off between robust and natural performances by enforcing a domain invariant feature representation. We present a new adversarial training method, called Domain Invariant Adversarial Learning (DIAL) that learns a feature representation which is both robust and domain invariant. DIAL uses a variant of Domain Adversarial Neural Network (DANN) on the natural domain and its corresponding adversarial domain. In a case where the source domain consists of natural examples and the target domain is the adversarially perturbed examples, our method learns a feature representation constrained not to discriminate between the natural and adversarial examples, and can therefore achieve better representation. We demonstrate our advantage by improving both robustness and natural accuracy compared to current state-of-the-art adversarial training methods.
翻译:发现对抗性实例揭示了深层神经网络的最基本弱点之一。在为解决这一内在弱点而采用的各种技术中,对抗性培训被证明是实现稳健的最常见和最有效的战略,通常通过平衡强健和自然损失来实现。在这项工作中,我们的目标是通过执行一个差异性特征的域别,在稳健和自然性能之间实现更好的权衡。我们提出了一种新的对抗性培训方法,称为Domain Invariant Aversarial Learning(DIAL),它学会了一种既强又易变的特征表现。DIAL在自然领域使用Domain Aversarial网络(DAN)的变式及其相应的对抗性域别。在一个情况下,如果源域由自然实例组成,而目标领域是受到对抗性侵扰的实例,我们的方法学到了一种特征代表性,但不会区分自然和对抗性能实例,因此可以实现更好的代表性。我们通过提高强健性和自然精度和自然精准性来证明我们的好处,而比目前的状态的对抗性能培训方法更强。