Adversarial training is an approach of increasing the robustness of models to adversarial attacks by including adversarial examples in the training set. One major challenge of producing adversarial examples is to contain sufficient perturbation in the example to flip the model's output while not making severe changes in the example's semantical content. Exuberant change in the semantical content could also change the true label of the example. Adding such examples to the training set results in adverse effects. In this paper, we present the Calibrated Adversarial Training, a method that reduces the adverse effects of semantic perturbations in adversarial training. The method produces pixel-level adaptations to the perturbations based on novel calibrated robust error. We provide theoretical analysis on the calibrated robust error and derive an upper bound for it. Our empirical results show a superior performance of the Calibrated Adversarial Training over a number of public datasets.
翻译:对抗性培训是提高对抗性攻击模式的稳健性的一种方法,在培训集中列入对抗性攻击的对抗性例子; 产生对抗性例子的一个主要挑战是在样例中包含足够的扰动性能,以翻转模型的输出,同时又不严重改变示例的语义内容。 语义内容的突变也可以改变示例的真实标签。 在培训集中添加这些例子会产生不利影响。 在本文中,我们介绍了校准的对立性训练,这是减少对抗性训练中语义突扰的不利影响的方法。 这种方法根据新校准的强力错误产生对扰动的像素级适应。 我们对校准的强力错误进行理论分析,并得出其上限。 我们的经验结果显示,校准的对性训练优于一些公共数据集。