Although Deep Neural Networks (DNNs) achieve excellent performance on many real-world tasks, they are highly vulnerable to adversarial attacks. A leading defense against such attacks is adversarial training, a technique in which a DNN is trained to be robust to adversarial attacks by introducing adversarial noise to its input. This procedure is effective but must be done during the training phase. In this work, we propose Augmented Random Forest (ARF), a simple and easy-to-use strategy for robustifying an existing pretrained DNN without modifying its weights. For every image, we generate randomized test time augmentations by applying diverse color, blur, noise, and geometric transforms. Then we use the DNN's logits output to train a simple random forest to predict the real class label. Our method achieves state-of-the-art adversarial robustness on a diversity of white and black box attacks with minimal compromise on the natural images' classification. We test ARF also against numerous adaptive white-box attacks and it shows excellent results when combined with adversarial training. Code is available at https://github.com/giladcohen/ARF.
翻译:虽然深神经网络(DNN)在许多现实世界任务上取得了卓越的成绩,但它们极易受到对抗性攻击的伤害。 对这种攻击的主要防御是对抗性训练,即对抗性训练,在这种训练中,DNN通过输入对抗性噪音来训练对对抗性攻击的有力性。这个程序是有效的,但在培训阶段必须完成。在这个工作中,我们提议扩大随机森林(ARF),这是一个简单和易于使用的策略,用以在不改变其重量的情况下巩固现有的预先训练过的DNN。对于每一个图像,我们通过应用不同颜色、模糊、噪音和几何性变来随机增加测试时间。然后,我们利用DNN的日志输出来训练一个简单的随机森林来预测真正的阶级标签。我们的方法在自然图像分类上最小的妥协下,在白色和黑箱攻击的多样性上达到了最先进的对抗性强性强性。我们测试ARF,还针对许多适应性白箱攻击进行测试,并在与对抗性训练相结合时显示出极好的结果。 守则可在 https://github.com/giladco/ARF中查阅。