Although fast adversarial training provides an efficient approach for building robust networks, it may suffer from a serious problem known as catastrophic overfitting (CO), where the multi-step robust accuracy suddenly collapses to zero. In this paper, we for the first time decouple the FGSM examples into data-information and self-information, which reveals an interesting phenomenon called "self-fitting". Self-fitting, i.e., DNNs learn the self-information embedded in single-step perturbations, naturally leads to the occurrence of CO. When self-fitting occurs, the network experiences an obvious "channel differentiation" phenomenon that some convolution channels accounting for recognizing self-information become dominant, while others for data-information are suppressed. In this way, the network learns to only recognize images with sufficient self-information and loses generalization ability to other types of data. Based on self-fitting, we provide new insight into the existing methods to mitigate CO and extend CO to multi-step adversarial training. Our findings reveal a self-learning mechanism in adversarial training and open up new perspectives for suppressing different kinds of information to mitigate CO.
翻译:尽管快速对抗性培训为建设强大的网络提供了一种有效的方法,但它可能遭遇一个严重的问题,即灾难性的过度装配(CO),多步稳健的准确性突然下降到零。在本文中,我们第一次将密克罗尼西亚联邦的例子与数据信息和自我信息脱钩,这揭示了一种有趣的现象,即“自我适应”。自我适应,即DNNNs学会了嵌入单步扰动中的自我信息,自然导致CO的出现。当自我装配发生时,这个网络会经历一种明显的“通道差异”现象,即某些承认自我信息的变革渠道成为主导,而其他数据信息渠道则被压制。通过这种方式,网络只学会了以足够的自我信息来识别图像,并丧失了其他类型数据的普及能力。基于自我调整,我们对现有减缓CO的方法提供了新的洞察力,并将CO扩大到多步的对抗性培训。我们的调查结果揭示了在对抗性培训中的自我学习机制,并打开了抑制不同信息以减缓CO的新视角。