The phenomenon of adversarial examples illustrates one of the most basic vulnerabilities of deep neural networks. Among the variety of techniques introduced to surmount this inherent weakness, adversarial training has emerged as the most effective strategy for learning robust models. Typically, this is achieved by balancing robust and natural objectives. In this work, we aim to further optimize the trade-off between robust and standard accuracy by enforcing a domain-invariant feature representation. We present a new adversarial training method, Domain Invariant Adversarial Learning (DIAL), which learns a feature representation that is both robust and domain invariant. DIAL uses a variant of Domain Adversarial Neural Network (DANN) on the natural domain and its corresponding adversarial domain. In the case where the source domain consists of natural examples and the target domain is the adversarially perturbed examples, our method learns a feature representation constrained not to discriminate between the natural and adversarial examples, and can therefore achieve a more robust representation. DIAL is a generic and modular technique that can be easily incorporated into any adversarial training method. Our experiments indicate that incorporating DIAL in the adversarial training process improves both robustness and standard accuracy.
翻译:对抗性实例的现象说明了深神经网络的最基本弱点之一。在为克服这一内在弱点而采用的各种技术中,对抗性培训已成为学习强健模型的最有效战略。通常,这是通过平衡稳健和自然目标来实现的。在这项工作中,我们的目标是通过执行一个域差异特征说明,进一步优化稳健和标准准确性之间的权衡。我们提出了一种新的对抗性培训方法,即Domain Invariant Adversarial Learning (DIAL),它学会了一种既强大又具有差异性的特征说明。DIAL在自然领域及其相应的对抗性领域使用DAN(Domain Aversarial Neural Net)的变式说明。如果源领域包括自然实例,而目标领域是对抗性隐蔽的实例,我们的方法学到了一种特征说明,但不能区分自然和对抗性实例,因此可以实现更强有力的说明。DIAL是一种通用和模块化技术,可以很容易地纳入任何对抗性训练方法。我们的实验表明,在对抗性培训过程中纳入DIAL的精确性和标准都提高了。