Whereas adversarial training can be useful against specific adversarial perturbations, they have also proven ineffective in generalizing towards attacks deviating from those used for training. However, we observe that this ineffectiveness is intrinsically connected to domain adaptability, another crucial issue in deep learning for which adversarial domain adaptation appears to be a promising solution. Consequently, we proposed Adv-4-Adv as a novel adversarial training method that aims to retain robustness against unseen adversarial perturbations. Essentially, Adv-4-Adv treats attacks incurring different perturbations as distinct domains, and by leveraging the power of adversarial domain adaptation, it aims to remove the domain/attack-specific features. This forces a trained model to learn a robust domain-invariant representation, which in turn enhances its generalization ability. Extensive evaluations on Fashion-MNIST, SVHN, CIFAR-10, and CIFAR-100 demonstrate that a model trained by Adv-4-Adv based on samples crafted by simple attacks (e.g., FGSM) can be generalized to more advanced attacks (e.g., PGD), and the performance exceeds state-of-the-art proposals on these datasets.
翻译:虽然对抗性训练可以对特定的对抗性干扰有用,但事实证明,对抗性训练在一般攻击上与与用于训练的攻击不同上是无效的,然而,我们注意到,这种效力与领域适应性有着内在的联系,这是深层次学习中的另一个关键问题,对抗性领域适应性适应似乎是一个有希望的解决办法;因此,我们提议Adv-4-Adv作为新的对抗性训练方法,旨在保持抵御无形对抗性扰动的强力。基本上,Adv-4-Adv将不同干扰性的攻击作为不同领域对待,并通过利用对抗性调整性域适应的力量,目的是消除域/攻击性特点的特征。这促使一种经过训练的模式学习强有力的域内变量代表制,这反过来又增强了其普遍化能力。我们对法西-MINST、SVHN、CIFAR-10和CIFAR-100的广泛评价表明,Adv-4-Adv所训练的模型以简单的攻击(例如,FGSM)所制成的样本为基础,可以被普遍化为较先进的攻击(例如,PGGGM)和业绩超过这些提议。