Adversarial training (AT) is one of the most effective strategies for promoting model robustness. However, recent benchmarks show that most of the proposed improvements on AT are less effective than simply early stopping the training procedure. This counter-intuitive fact motivates us to investigate the implementation details of tens of AT methods. Surprisingly, we find that the basic settings (e.g., weight decay, training schedule, etc.) used in these methods are highly inconsistent. In this work, we provide comprehensive evaluations on CIFAR-10, focusing on the effects of mostly overlooked training tricks and hyperparameters for adversarially trained models. Our empirical observations suggest that adversarial robustness is much more sensitive to some basic training settings than we thought. For example, a slightly different value of weight decay can reduce the model robust accuracy by more than 7%, which is probable to override the potential promotion induced by the proposed methods. We conclude a baseline training setting and re-implement previous defenses to achieve new state-of-the-art results. These facts also appeal to more concerns on the overlooked confounders when benchmarking defenses.
翻译:然而,最近的基准显示,关于ATT的拟议改进大多不如仅仅及早停止培训程序有效。这一反直觉事实促使我们调查AT方法中数十种方法的实施细节。令人惊讶的是,我们发现这些方法中使用的基本环境(如体重衰减、培训时间表等)高度不一致。在这项工作中,我们提供了对CIFAR-10的全面评价,重点是多数被忽视的培训技巧和对抗性训练模型超参数的影响。我们的实证观察表明,对抗性强势对一些基本培训环境比我们想象的要敏感得多。例如,稍微不同的重量衰减价值可能会将模型的稳健准确性降低7%以上,这有可能压倒拟议方法所引发的潜在促进作用。我们完成了基线培训设置和重新实施以前的防御,以取得新的最新结果。这些事实也更引人关注在为基准的防御时被忽略的同伴。