Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings, instead of relying on computationally-expensive pixel-level perturbations. We propose Adversarial Feature Augmentation and Normalization (A-FAN), which (i) first augments visual recognition models with adversarial features that integrate flexible scales of perturbation strengths, (ii) then extracts adversarial feature statistics from batch normalization, and re-injects them into clean features through feature normalization. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks, including ResNets and EfficientNets for classification, Faster-RCNN for detection, and Deeplab V3+ for segmentation. Extensive experiments show that A-FAN yields consistent generalization improvement over strong baselines across various datasets for classification, detection and segmentation tasks, such as CIFAR-10, CIFAR-100, ImageNet, Pascal VOC2007, Pascal VOC2012, COCO2017, and Cityspaces. Comprehensive ablation studies and detailed analyses also demonstrate that adding perturbations to specific modules and layers of classification/detection/segmentation backbones yields optimal performance. Codes and pre-trained models will be made available at: https://github.com/VITA-Group/CV_A-FAN.
翻译:计算机视野的近期进步利用了对抗性数据增强,提高了分类模型的普及能力。在这里,我们提出了一个有效和高效的替代方法,主张对中间特征嵌入进行对抗性增强,而不是依赖计算用价比像素水平的扰动。我们提议对等特征增强和正常化进行反向调整(A-FAN),即(一)首先增加具有对抗性特征的视觉识别模型,将灵活的扰动强力规模整合在一起,(二)然后从批量正常化中提取对抗性特征统计数据,然后通过特征正常化将这些数据重新注入清洁特征。我们通过具有代表性的骨干网络,包括用于分类的ResNets和高效网络、用于检测的快速RCNNN和用于分解的Deeplab V3+。 广泛的实验表明,A-FAN在分类、检测和分解任务的各种数据集中,如CIFAR-10、CIFAR-100、图像Net、Pascal VOC-2007、Pascal-VOC-2012、CO20-dedel-TA-stimations 以及具体水平/Simalations Axal-Axlations)的可靠分析,也将化和最佳分析。