Despite clear computational advantages in building robust neural networks, adversarial training (AT) using single-step methods is unstable as it suffers from catastrophic overfitting (CO): Networks gain non-trivial robustness during the first stages of adversarial training, but suddenly reach a breaking point where they quickly lose all robustness in just a few iterations. Although some works have succeeded at preventing CO, the different mechanisms that lead to this remarkable failure mode are still poorly understood. In this work, however, we find that the interplay between the structure of the data and the dynamics of AT plays a fundamental role in CO. Specifically, through active interventions on typical datasets of natural images, we establish a causal link between the structure of the data and the onset of CO in single-step AT methods. This new perspective provides important insights into the mechanisms that lead to CO and paves the way towards a better understanding of the general dynamics of robust model construction. The code to reproduce the experiments of this paper can be found at https://github.com/gortizji/co_features .
翻译:尽管在建设强大的神经网络方面有明显的计算优势,但使用单步方法的对抗性培训(AT)不稳定,因为它受到灾难性的过度配置的影响(CO):在对抗性培训的第一阶段,网络获得非三步制强力,但突然达到一个断裂点,在几个迭代中,它们迅速丧失了所有强力。虽然有些工作成功地防止了CO,但导致这一显著失败模式的不同机制仍然不甚为人知。在这项工作中,我们发现AT的数据结构和动态之间的相互作用在CO中起着根本作用。具体来说,通过对典型的自然图像数据集的积极干预,我们在数据结构与以单步方法启动CO之间建立了因果关系。这一新的观点为引导CO的机制提供了重要的洞察力,并为更好地理解稳健模型建设的总体动态铺平了道路。在http://github.com/gorizji/co_featatatures上可以找到复制该文件实验的代码。