Deep learning models are prone to being fooled by imperceptible perturbations known as adversarial attacks. In this work, we study how equipping models with Test-time Transformation Ensembling (TTE) can work as a reliable defense against such attacks. While transforming the input data, both at train and test times, is known to enhance model performance, its effects on adversarial robustness have not been studied. Here, we present a comprehensive empirical study of the impact of TTE, in the form of widely-used image transforms, on adversarial robustness. We show that TTE consistently improves model robustness against a variety of powerful attacks without any need for re-training, and that this improvement comes at virtually no trade-off with accuracy on clean samples. Finally, we show that the benefits of TTE transfer even to the certified robustness domain, in which TTE provides sizable and consistent improvements.
翻译:深层次的学习模式容易被被称为对抗性攻击的不可察觉的干扰所愚弄。 在这项工作中,我们研究如何使测试-时间变换组合(TTE)模型成为抵御这种攻击的可靠防御手段。 虽然在火车和试验时间对输入数据进行转换已知能提高模型性能,但对对抗性强力的影响却没有研究。在这里,我们以广泛使用的图像变换的形式,对对抗性强力进行了全面的实证研究。我们表明,TTE在无需再培训的情况下,不断改进各种强大攻击的模型的稳健性,而这种改进几乎没有在清洁样品的准确性方面进行交易。最后,我们表明,TTE即使转让给经认证的强力领域,也带来了好处,在这种领域,TTE提供了可实现和一致的改进。