Although Deep Neural Networks (DNNs) achieve excellent performance on many real-world tasks, they are highly vulnerable to adversarial attacks. A leading defense against such attacks is adversarial training, a technique in which a DNN is trained to be robust to adversarial attacks by introducing adversarial noise to its input. This procedure is effective but must be done during the training phase. In this work, we propose a new simple and easy-to-use technique, KATANA, for robustifying an existing pretrained DNN without modifying its weights. For every image, we generate N randomized Test Time Augmentations (TTAs) by applying diverse color, blur, noise, and geometric transforms. Next, we utilize the DNN's logits output to train a simple random forest classifier to predict the real class label. Our strategy achieves state-of-the-art adversarial robustness on diverse attacks with minimal compromise on the natural images' classification. We test KATANA also against two adaptive white-box attacks and it shows excellent results when combined with adversarial training. Code is available in https://github.com/giladcohen/KATANA.
翻译:虽然深神经网络(DNNS)在许多现实世界任务上取得了卓越的成绩,但它们极易受到对抗性攻击的伤害。 对抗性训练是对这种攻击的主要防御手段,即对抗性训练,在这种训练中,DNN通过输入对抗性噪音来训练对对抗性攻击的强大。这个程序是有效的,但在培训阶段必须完成。在这项工作中,我们提出一种新的简单而容易使用的技术,即KATANA, 以在不改变其重量的情况下巩固现有的预先训练过的DNNN, 对于每一个图像来说,我们通过应用不同颜色、模糊、噪音和几何学变来产生随机测试时间增强(TTAs)。接下来,我们利用DNN的日志输出来训练一个简单的随机森林分类器来预测真正的阶级标签。我们的战略在各种攻击上达到最先进的对抗性强的状态,在自然图像分类上达成最小的妥协。我们测试KATANANA,还针对两种适应性白箱攻击,并且当与对抗性训练相结合时显示优异的结果。 守则可在 https://github.com/giladadad/ATAN。