A counter-intuitive property of convolutional neural networks (CNNs) is their inherent susceptibility to adversarial examples, which severely hinders the application of CNNs in security-critical fields. Adversarial examples are similar to original examples but contain malicious perturbations. Adversarial training is a simple and effective defense method to improve the robustness of CNNs to adversarial examples. The mechanisms behind adversarial examples and adversarial training are worth exploring. Therefore, this work investigates similarities and differences between normally trained CNNs (NT-CNNs) and adversarially trained CNNs (AT-CNNs) in information extraction from the mutual information perspective. We show that 1) whether NT-CNNs or AT-CNNs, for original and adversarial examples, the trends towards mutual information are almost similar throughout training; 2) compared with normal training, adversarial training is more difficult and the amount of information that AT-CNNs extract from the input is less; 3) the CNNs trained with different methods have different preferences for certain types of information; NT-CNNs tend to extract texture-based information from the input, while AT-CNNs prefer to shape-based information. The reason why adversarial examples mislead CNNs may be that they contain more texture-based information about other classes. Furthermore, we also analyze the mutual information estimators used in this work and find that they outline the geometric properties of the middle layer's output.
翻译:进化神经网络(CNNs)的反直觉属性是其内在的对立实例,严重妨碍CNN在安全关键领域应用CNN。反向实例与原始实例相似,但含有恶意干扰。反向培训是一种简单而有效的防御方法,可以提高CNN对敌对实例的稳健性。对抗实例和对抗性培训背后的机制值得探索。因此,这项工作调查了通常受过训练的CNN(NT-CNNs)和经过对抗训练的CNNCN(AT-CNNs)在从相互信息角度提取信息方面的异同之处和差异。我们显示:(1)无论是NT-CNNs还是AT-CNNs,对于原始和对抗性攻击性实例来说,相互信息的趋势几乎是相似的;(2)与正常培训相比,对抗性培训更为困难,而且AT-CNNs从投入中提取的信息数量较少;(3)经过不同方法培训的CNNNMs对某些类型的信息有不同偏好;NT-CNNNNs倾向于从相互的信息提取中提取基于文本的信息,而我们更倾向于从A-NMalimalimalimal ex的输出中选取其他文本。