Adversarial learning has emerged as one of the successful techniques to circumvent the susceptibility of existing methods against adversarial perturbations. However, the majority of existing defense methods are tailored to defend against a single category of adversarial perturbation (e.g. $\ell_\infty$-attack). In safety-critical applications, this makes these methods extraneous as the attacker can adopt diverse adversaries to deceive the system. Moreover, training on multiple perturbations simultaneously significantly increases the computational overhead during training. To address these challenges, we propose a novel meta-learning framework that explicitly learns to generate noise to improve the model's robustness against multiple types of attacks. Its key component is Meta Noise Generator (MNG) that outputs optimal noise to stochastically perturb a given sample, such that it helps lower the error on diverse adversarial perturbations. By utilizing samples generated by MNG, we train a model by enforcing the label consistency across multiple perturbations. We validate the robustness of models trained by our scheme on various datasets and against a wide variety of perturbations, demonstrating that it significantly outperforms the baselines across multiple perturbations with a marginal computational cost.
翻译:在安全关键应用中,由于攻击者可以采用不同的对手来欺骗系统,因此这些方法就具有外在意义。此外,关于多重扰动的培训同时大大提高了培训期间的计算间接费用。为了应对这些挑战,我们提议了一个新型的元学习框架,明确学会产生噪音,以提高模型对多种类型攻击的稳健性。它的关键组成部分是Meta Noise生成器(MNG),它能产生最佳的噪音来对特定样本进行随机扰动,从而帮助降低多种对抗性扰动的错误。通过利用由攻击者生成的样本,我们通过在多个扰动中加强标签一致性来培训一个模型。我们验证了我们所培训的各种数据集模型的稳健性,以及防止多种类型攻击的广度扰动性。它显示,它超越了多种成本的基线。