In response to the threat of adversarial examples, adversarial training provides an attractive option for enhancing the model robustness by training models on online-augmented adversarial examples. However, most of the existing adversarial training methods focus on improving the robust accuracy by strengthening the adversarial examples but neglecting the increasing shift between natural data and adversarial examples, leading to a dramatic decrease in natural accuracy. To maintain the trade-off between natural and robust accuracy, we alleviate the shift from the perspective of feature adaption and propose a Feature Adaptive Adversarial Training (FAAT) optimizing the class-conditional feature adaption across natural data and adversarial examples. Specifically, we propose to incorporate a class-conditional discriminator to encourage the features become (1) class-discriminative and (2) invariant to the change of adversarial attacks. The novel FAAT framework enables the trade-off between natural and robust accuracy by generating features with similar distribution across natural and adversarial data, and achieve higher overall robustness benefited from the class-discriminative feature characteristics. Experiments on various datasets demonstrate that FAAT produces more discriminative features and performs favorably against state-of-the-art methods. Codes are available at https://github.com/VisionFlow/FAAT.
翻译:针对对抗性实例的威胁,对抗性培训为通过在线强化对抗性实例培训模式加强示范性强强提供了一个有吸引力的选择,但是,大多数现有的对抗性培训方法侧重于通过加强对抗性实例来提高稳健准确性,但忽视自然数据和对抗性实例之间日益变化,导致自然精确度急剧下降。为了保持自然和稳健准确性之间的权衡,我们从特征适应性的角度出发,减缓了转变,并提出了一种功能适应性适应性对立性培训(FAAT),优化了在自然数据和对抗性实例中的等级条件性适应性特征。具体地说,我们提议纳入一个等级条件歧视性歧视性歧视,鼓励这些特征成为(1) 阶级差异性差异,(2) 与对抗性攻击的变化不相适应。新的FAAT框架使得自然和稳健性准确性之间的权衡能够产生与自然数据和对抗性数据相似的分布特征,并实现更全面强的稳健性强性,受益于等级差异性特征。对各种数据集的实验表明,FAAT产生更具有歧视性的特征,并且对AFAT-FAR-AD-FAR-AD-AD-AD-AD-AD-AD-FAD-AD-AD-AD-AD-AD-AD-AD-AD-AD-AD-AD-AD-AD-AD-AD-AD-A-A-A-AD-AD-A-A-A-A-A-AD-AD-AF可以使用。