In this paper we introduce a provably stable architecture for Neural Ordinary Differential Equations (ODEs) which achieves non-trivial adversarial robustness under white-box adversarial attacks even when the network is trained naturally. For most existing defense methods withstanding strong white-box attacks, to improve robustness of neural networks, they need to be trained adversarially, hence have to strike a trade-off between natural accuracy and adversarial robustness. Inspired by dynamical system theory, we design a stabilized neural ODE network named SONet whose ODE blocks are skew-symmetric and proved to be input-output stable. With natural training, SONet can achieve comparable robustness with the state-of-the-art adversarial defense methods, without sacrificing natural accuracy. Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e.g., under PGD-20 ($\ell_\infty=0.031$) attack on CIFAR-10 dataset, it achieves 91.57\% and natural accuracy and 62.35\% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and robust accuracy 76.29\% and 45.24\%, respectively. To understand possible reasons behind this surprisingly good result, we further explore the possible mechanism underlying such an adversarial robustness. We show that the adaptive stepsize numerical ODE solver, DOPRI5, has a gradient masking effect that fails the PGD attacks which are sensitive to gradient information of training loss; on the other hand, it cannot fool the CW attack of robust gradients and the SPSA attack that is gradient-free. This provides a new explanation that the adversarial robustness of ODE-based networks mainly comes from the obfuscated gradients in numerical ODE solvers.
翻译:在本文中,我们引入了一个稳定的神经普通差异计算结构(ODEs),这个结构在白箱对抗性攻击中(即使网络是自然训练的)实现了非边际对抗性强势。对于大多数现有的具有强烈白箱攻击的防御方法来说,为了提高神经网络的稳健性,它们需要经过对抗性训练,因此它们必须达到自然精确性和对抗性强力之间的平衡。在动态系统理论的启发下,我们设计了一个名为SONet的稳定的神经内径码(SONet)网络,其内含对称性,并证明在白箱对抗性对抗性攻击下,在白箱对抗性攻击中实现了非边际对抗性对抗性强力。在自然训练的情况下,SONet可以取得与最先进的对抗性防御性防御性防御方法的相似性强力。例如,在PGD-20(OD=infty=0.031美元)的基下,对CIFAR-10 内含性攻击性攻击性攻击性攻击性攻击,它能达到957和自然的精确性强性,以及62.3-25的精确性精确性解释,而SNet的相对性攻击性攻击性解释也能够显示这种精确性攻击性攻击性攻击性研究的精确性。