Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Current attack methods are able to manipulate the network's prediction by adding specific but small amounts of noise to the input. In turn, adversarial training (AT) aims to achieve robustness against such attacks and ideally a better model generalization ability by including adversarial samples in the trainingset. However, an in-depth analysis of the resulting robust models beyond adversarial robustness is still pending. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions, even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences. Data & Project website: https://github.com/GeJulia/robustness_confidences_evaluation
翻译:尽管在计算机视觉任务的许多学术基准中,共生神经网络(CNNs)取得了成功,但它们在现实世界中的应用仍然面临着根本性的挑战。其中一个未解决的问题是,由于对抗性攻击的惊人效果而暴露的内在缺乏强健性。目前的攻击方法能够操纵网络的预测,在输入材料中添加具体但少量的噪音。反过来,对抗性训练(AT)的目的是针对这种攻击实现强健性,最好通过将对抗性样板纳入训练中来形成更好的典型概括化能力。然而,对由此产生的强健模型的深入分析仍然有待于进行。在本文中,我们从经验上分析了各种经过对抗性训练的、在面对最先进的攻击时达到高度强健健健健健的防性模型。我们表明,AT具有有趣的副效应:它导致一些模式,即使对清洁数据比非野蛮模型更不过分。此外,我们对强健健模型的分析表明,不仅AT,而且模型的构筑部分(如激活功能和集合)也具有很强的影响。 数据/GROVSUI网站: 数据/AmbUI/AVI/AVIURI)。