Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years. Yet, vanilla CNNs often fall short in generalizing to adversarial or out-of-distribution (OOD) examples which humans demonstrate superior performance. Adversarial training is a leading learning algorithm for improving the robustness of CNNs on adversarial and OOD data; however, little is known about the properties, specifically the shape bias and internal features learned inside adversarially-robust CNNs. In this paper, we perform a thorough, systematic study to understand the shape bias and some internal mechanisms that enable the generalizability of AlexNet, GoogLeNet, and ResNet-50 models trained via adversarial training. We find that while standard ImageNet classifiers have a strong texture bias, their R counterparts rely heavily on shapes. Remarkably, adversarial training induces three simplicity biases into hidden neurons in the process of "robustifying" CNNs. That is, each convolutional neuron in R networks often changes to detecting (1) pixel-wise smoother patterns, i.e., a mechanism that blocks high-frequency noise from passing through the network; (2) more lower-level features i.e. textures and colors (instead of objects);and (3) fewer types of inputs. Our findings reveal the interesting mechanisms that made networks more adversarially robust and also explain some recent findings e.g., why R networks benefit from a much larger capacity (Xie et al. 2020) and can act as a strong image prior in image synthesis (Santurkar et al. 2019).
翻译:过去几年来,人类的视觉和神经神经网络(CNNs)之间的相似性越来越明显。然而,香草CNN往往没有达到人类表现优异的通用或超分配(OOOD)范例。反向培训是提高CNN在对抗和OOOD数据上的稳健性的主要学习算法;然而,关于这些属性,具体而言,在对抗-暴动CNN(CNN)中学到的形状偏差和内部特征鲜为人知。在本文中,我们进行了彻底、系统的研究,以了解形状偏差和一些内部机制,从而能够通过对抗性培训使AlexNet、GoogLeNet和ResNet-50模式具有普遍性。我们发现,虽然标准图像网络分类者具有强烈的纹理偏差,但其对应者则严重依赖形状。值得注意的是,对抗性培训在“破坏”CNN的进程中对隐藏的神经进行了三种简单偏差。 也就是说,R网络中的每一种变异神经常常常常常变,常常改变(1) 以更稳健的平平的网络, ireareax lax lax laftal laveal lax lax lax ladeal lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax