Modern machine learning and deep learning models are shown to be vulnerable when testing data are slightly perturbed. Existing theoretical studies of adversarial training algorithms mostly focus on either adversarial training losses or local convergence properties. In contrast, this paper studies the generalization performance of a generic adversarial training algorithm. Specifically, we consider linear regression models and two-layer neural networks (with lazy training) using squared loss under low-dimensional and high-dimensional regimes. In the former regime, after overcoming the non-smoothness of adversarial training, the adversarial risk of the trained models can converge to the minimal adversarial risk. In the latter regime, we discover that data interpolation prevents the adversarially robust estimator from being consistent. Therefore, inspired by successes of the least absolute shrinkage and selection operator (LASSO), we incorporate the L1 penalty in the high dimensional adversarial learning and show that it leads to consistent adversarially robust estimation. A series of numerical studies are conducted to demonstrate how the smoothness and L1 penalization help improve the adversarial robustness of DNN models.
翻译:在测试数据时,现代机器学习和深层次学习模式被略为扰动。现有的对抗性培训算法理论研究主要侧重于对抗性培训损失或当地趋同特性。相比之下,本文件研究通用对抗性培训算法的通用性表现。具体地说,我们考虑在低维和高维制度下利用平方损失的线性回归模型和双层神经网络(经过懒惰培训)。在前一种制度下,在克服了对抗性培训的非抽吸性之后,经过培训的模型的对抗性风险可以集中到最低限度的对抗性风险。在后一种制度下,我们发现数据相互交织防止了对抗性强的估测者的一致性。因此,在最低绝对收缩和选择操作者的成功激励下,我们将L1惩罚纳入高维度对抗性对抗性学习中,并表明它导致一致的对抗性强势估计。进行了一系列数字研究,以证明平滑和L1惩罚如何帮助提高DNN模型的对抗性强性。