In adversarial machine learning, deep neural networks can fit the adversarial examples on the training dataset but have poor generalization ability on the test set. This phenomenon is called robust overfitting, and it can be observed when adversarially training neural nets on common datasets, including SVHN, CIFAR-10, CIFAR-100, and ImageNet. In this paper, we study the robust overfitting issue of adversarial training by using tools from uniform stability. One major challenge is that the outer function (as a maximization of the inner function) is nonsmooth, so the standard technique (e.g., hardt et al., 2016) cannot be applied. Our approach is to consider $\eta$-approximate smoothness: we show that the outer function satisfies this modified smoothness assumption with $\eta$ being a constant related to the adversarial perturbation $\epsilon$. Based on this, we derive stability-based generalization bounds for stochastic gradient descent (SGD) on the general class of $\eta$-approximate smooth functions, which covers the adversarial loss. Our results suggest that robust test accuracy decreases in $\epsilon$ when $T$ is large, with a speed between $\Omega(\epsilon\sqrt{T})$ and $\mathcal{O}(\epsilon T)$. This phenomenon is also observed in practice. Additionally, we show that a few popular techniques for adversarial training (e.g., early stopping, cyclic learning rate, and stochastic weight averaging) are stability-promoting in theory.
翻译:在对抗性机器学习中,深神经网络可以适应培训数据集的对抗性实例,但测试集的概括能力较差。这种现象被称为强力超标,当对包括SVHN、CIFAR-10、CIFAR-100和图像网络在内的通用数据集进行对抗性培训神经网时,可以观察到这种现象。在本文中,我们通过使用统一稳定性的工具研究强力超配对抗性培训问题。一个重大挑战是外部功能(由于内部功能最大化)不是光滑的,因此标准技术(例如硬等,2016年)无法应用。我们的方法是考虑美元/美元-直观的光滑度:我们显示外功能满足这一经修改的光度假设,美元/美元与对抗性冲击性冲击值($\epluslon)有关的常数。基于这一点,我们从一般的美元-接近性梯位(SGD)中得出基于稳定性的概括性平级(美元-接近性平滑度函数,也显示T-美元/Otral的稳定性。我们的测试结果显示,在美元中的精度中,正值是高的精确度。