Deep neural networks are widely recognized as being vulnerable to adversarial perturbation. To overcome this challenge, developing a robust classifier is crucial. So far, two well-known defenses have been adopted to improve the learning of robust classifiers, namely adversarial training (AT) and Jacobian regularization. However, each approach behaves differently against adversarial perturbations. First, our work carefully analyzes and characterizes these two schools of approaches, both theoretically and empirically, to demonstrate how each approach impacts the robust learning of a classifier. Next, we propose our novel Optimal Transport with Jacobian regularization method, dubbed OTJR, jointly incorporating the input-output Jacobian regularization into the AT by leveraging the optimal transport theory. In particular, we employ the Sliced Wasserstein (SW) distance that can efficiently push the adversarial samples' representations closer to those of clean samples, regardless of the number of classes within the dataset. The SW distance provides the adversarial samples' movement directions, which are much more informative and powerful for the Jacobian regularization. Our extensive experiments demonstrate the effectiveness of our proposed method, which jointly incorporates Jacobian regularization into AT. Furthermore, we demonstrate that our proposed method consistently enhances the model's robustness with CIFAR-100 dataset under various adversarial attack settings, achieving up to 28.49% under AutoAttack.
翻译:深度神经网络普遍存在鲁棒性脆弱性的问题,因此开发具有鲁棒性的分类器至关重要。目前,对于提高鲁棒性学习,采用了两种广为使用的防御方式:对抗训练和雅可比正则化。然而,每种方法针对对抗性扰动的表现差异很大。本文首先通过理论和实证分析,仔细研究和表征这两种方法,以展示它们如何影响分类器的鲁棒性学习。接下来,我们提出了一种全新的最优输运与雅可比正则化方法,称为 OTJR,通过引入最优输运理论,将输入-输出雅可比正则化并入对抗训练中。具体来说,我们采用可以快速将对抗样本表示推向干净样本表示的截短 Wasserstein(SW)距离。无论数据集中有多少个类别,SW距离都能提供对抗样本的运动方向,这在雅可比正则化中更具信息量和助益。我们的广泛实验表明,我们提出的同时将雅可比正则化并入对抗训练的方法是有效的。此外,我们还证明,我们提出的方法在各种对抗攻击设置下均能显著提高模型的鲁棒性,在 CIFAR-100 数据集上完成了多个实验,最高在 AutoAttack 下达到了 28.49%。