The widespread adoption of deep neural networks in computer vision applications has brought forth a significant interest in adversarial robustness. Existing research has shown that maliciously perturbed inputs specifically tailored for a given model (i.e., adversarial examples) can be successfully transferred to another independently trained model to induce prediction errors. Moreover, this property of adversarial examples has been attributed to features derived from predictive patterns in the data distribution. Thus, we are motivated to investigate the following question: Can adversarial defenses, like adversarial examples, be successfully transferred to other independently trained models? To this end, we propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE). After examining theoretical motivation and implications, we experimentally show that our method can provide adversarial robustness to multiple independently pre-trained classifiers that are otherwise ineffective against an adaptive white box adversary. Furthermore, we show that RTFEs can even provide one-shot adversarial robustness to models independently trained on different datasets.
翻译:在计算机视觉应用中广泛采用深心神经网络已引起了对对抗性强力的极大兴趣。现有研究表明,针对特定模型(即对抗性实例)专门设计的恶意干扰投入可以成功地转移到另一个独立培训的模式中,以诱发预测错误。此外,这种对抗性实例的特性可归因于数据分布的预测模式所产生的特征。因此,我们有动力调查以下问题:对抗性辩护,如对抗性例子,能否成功地转移到其他独立培训的模型中去?为此目的,我们提议了一个深思熟虑的预处理机制,我们称之为一种强有力的可转移特征提取器(RTFE ) 。在研究了理论动机和影响之后,我们实验性地表明,我们的方法可以向多个独立培训前的分类师提供对抗性强力,而这种分类对适应性白箱对适应性对手来说没有效力。此外,我们还表明,RTFE甚至可以向在不同数据集上独立培训的模型提供一线对抗性强力。