Model robustness against adversarial examples of single perturbation type such as the $\ell_{p}$-norm has been widely studied, yet its generalization to more realistic scenarios involving multiple semantic perturbations and their composition remains largely unexplored. In this paper, we firstly propose a novel method for generating composite adversarial examples. By utilizing component-wise projected gradient descent and automatic attack-order scheduling, our method can find the optimal attack composition. We then propose \textbf{generalized adversarial training} (\textbf{GAT}) to extend model robustness from $\ell_{p}$-norm to composite semantic perturbations, such as the combination of Hue, Saturation, Brightness, Contrast, and Rotation. The results on ImageNet and CIFAR-10 datasets show that GAT can be robust not only to any single attack but also to any combination of multiple attacks. GAT also outperforms baseline $\ell_{\infty}$-norm bounded adversarial training approaches by a significant margin.
翻译:在本文中,我们首先提出了一个生成复合对抗性实例的新方法。通过使用组件预测梯度下行和自动攻击命令列表,我们的方法可以找到最佳攻击构成。然后我们提议\ textb{一般对抗训练}(\ textbf{inf{GAT})将模型强度从$\ellp}$-norm扩大到复合的语义扰动,例如Hue、Satulation、Brightness、 Contrast和Rotation等组合。图像网和CIFAR-10数据集的结果表明,GAT不仅可以针对任何单次攻击,而且可以针对任何多重攻击的组合。GAT还以显著的幅度,将模型强度从$-norm扩大到复合语义扰动性扰动方法。