Adversarial training (AT) is proved to reliably improve network's robustness against adversarial data. However, current AT with a pre-specified perturbation budget has limitations in learning a robust network. Firstly, applying a pre-specified perturbation budget on networks of various model capacities will yield divergent degree of robustness disparity between natural and robust accuracies, which deviates from robust network's desideratum. Secondly, the attack strength of adversarial training data constrained by the pre-specified perturbation budget fails to upgrade as the growth of network robustness, which leads to robust overfitting and further degrades the adversarial robustness. To overcome these limitations, we propose \emph{Strength-Adaptive Adversarial Training} (SAAT). Specifically, the adversary employs an adversarial loss constraint to generate adversarial training data. Under this constraint, the perturbation budget will be adaptively adjusted according to the training state of adversarial data, which can effectively avoid robust overfitting. Besides, SAAT explicitly constrains the attack strength of training data through the adversarial loss, which manipulates model capacity scheduling during training, and thereby can flexibly control the degree of robustness disparity and adjust the tradeoff between natural accuracy and robustness. Extensive experiments show that our proposal boosts the robustness of adversarial training.
翻译:事实证明,对冲培训(AT)可以可靠地改善网络对对抗性数据的稳健性;然而,目前具有预先规定的扰动预算的对冲培训(AT)在学习一个稳健的网络方面有局限性。首先,对各种模型能力网络适用预先确定的扰动预算,将产生不同程度的强力差异,自然和强力的对称培训(AAT)之间在强力差异上的差异,这不同于强力网络的对冲培训。第二,受预先规定的扰动预算制约的对冲培训数据的攻击力因网络稳健性的增长而未能升级,从而导致强力的过度适应和进一步削弱对冲强力强力。为了克服这些局限性,我们提议对冲预算。具体地说,对准对手采用对抗性损失限制来生成对抗性培训数据。 在这种制约下,对冲预算将根据对冲性数据的培训状态进行调整,从而有效避免强力超力。此外,SAAT还明确限制培训对冲性数据的攻击力,通过对抗性培训的稳健健性损失来限制攻击性培训数据,从而调整对稳健健性贸易能力进行灵活控制,从而调整了对称的模型的弹性控制。