A counter-intuitive property of convolutional neural networks (CNNs) is their inherent susceptibility to adversarial examples, which severely hinders the application of CNNs in security-critical fields. Adversarial examples are similar to original examples but contain malicious perturbations. Adversarial training is a simple and effective training method to improve the robustness of CNNs to adversarial examples. The mechanisms behind adversarial examples and adversarial training are worth exploring. Therefore, this work investigates similarities and differences between two types of CNNs (both normal and robust ones) in information extraction by observing the trends towards the mutual information. We show that 1) the amount of mutual information that CNNs extract from original and adversarial examples is almost similar, whether CNNs are in normal training or adversarial training; the reason why adversarial examples mislead CNNs may be that they contain more texture-based information about other categories; 2) compared with normal training, adversarial training is more difficult and the amount of information extracted by the robust CNNs is less; 3) the CNNs trained with different methods have different preferences for certain types of information; normally trained CNNs tend to extract texture-based information from the inputs, while adversarially trained models prefer to shape-based information. Furthermore, we also analyze the mutual information estimators used in this work, kernel-density-estimation and binning methods, and find that these estimators outline the geometric properties of the middle layer's output to a certain extent.
翻译:对抗性培训是一种简单有效的培训方法,可以提高CNN对敌对性实例的稳健性能; 对抗性实例和对抗性培训背后的机制值得探索; 因此,这项工作调查两种类型的CNN(正常和强势)在信息提取方面的相似和差异,通过观察对等信息的趋势来调查这些信息提取方面的趋势。 我们表明:(1) CNN从原始和对抗性实例中提取的相互信息数量几乎相似,无论是在正常培训还是对抗性培训中; 反向培训是一种简单有效的培训方法,可以提高CNN的稳健性能;(2) 与正常培训相比,对抗性培训更为困难,而由强健的CNN所提取的信息数量较少;(3) 接受过不同方法培训的CNN对中度信息有不同偏好; 通常受过培训的CNNPN从原始和对抗性实例中提取的信息数量是相似的,无论是在普通培训的培训和对抗性培训中度培训中; 也倾向于对正性信息进行这样的格式分析。