Neural networks have a number of shortcomings. Amongst the severest ones is the sensitivity to distribution shifts which allows models to be easily fooled into wrong predictions by small perturbations to inputs that are often imperceivable to humans and do not have to carry semantic meaning. Adversarial training poses a partial solution to address this issue by training models on worst-case perturbations. Yet, recent work has also pointed out that the reasoning in neural networks is different from humans. Humans identify objects by shape, while neural nets mainly employ texture cues. Exemplarily, a model trained on photographs will likely fail to generalize to datasets containing sketches. Interestingly, it was also shown that adversarial training seems to favorably increase the shift toward shape bias. In this work, we revisit this observation and provide an extensive analysis of this effect on various architectures, the common $\ell_2$- and $\ell_\infty$-training, and Transformer-based models. Further, we provide a possible explanation for this phenomenon from a frequency perspective.
翻译:神经网络存在许多缺陷之一是对分布变化的敏感性,使模型很容易被微小的扰动愚弄,并出现错误的预测,这些扰动对人类来说通常是难以察觉且不必携带语义含义的。对抗训练提出了部分解决方案,通过在最坏情况下的扰动上进行训练。然而,最近的研究还指出神经网络的推理方式不同于人类。人类通过形状识别物体,而神经网络主要使用纹理线索。例如,训练于照片上的模型很可能无法推广到包含草图的数据集。有趣的是,还表明,对抗训练似乎有利于增加形状偏见。在此工作中,我们重新审视了这一观察结果,并提供了关于各种体系结构、常见的$\ell_2$和$\ell_\infty$训练以及基于Transformer的模型的深入分析。此外,我们从频率角度提供了可能的解释。