Adversarial training (AT) has been demonstrated as one of the most promising defense methods against various adversarial attacks. To our knowledge, existing AT-based methods usually train with the locally most adversarial perturbed points and treat all the perturbed points equally, which may lead to considerably weaker adversarial robust generalization on test data. In this work, we introduce a new adversarial training framework that considers the diversity as well as characteristics of the perturbed points in the vicinity of benign samples. To realize the framework, we propose a Regional Adversarial Training (RAT) defense method that first utilizes the attack path generated by the typical iterative attack method of projected gradient descent (PGD), and constructs an adversarial region based on the attack path. Then, RAT samples diverse perturbed training points efficiently inside this region, and utilizes a distance-aware label smoothing mechanism to capture our intuition that perturbed points at different locations should have different impact on the model performance. Extensive experiments on several benchmark datasets show that RAT consistently makes significant improvement on standard adversarial training (SAT), and exhibits better robust generalization.
翻译:反向培训(AT)已被证明是对付各种对抗性攻击的最有希望的防御方法之一。 据我们所知,现有的反向培训方法通常使用当地最敌对的侵袭点进行训练,并平等地对待所有扰动点,这可能导致对测试数据进行较弱的对抗性强强概括化。在这项工作中,我们引入了新的对抗性培训框架,考虑到良性样品附近受扰动点的多样性和特点。为了实现这一框架,我们建议采用区域反向培训(RAT)防御方法,首先利用典型的重复性攻击方法(预测梯度下降)产生的攻击路径,然后根据攻击路径构建一个对抗性区域。然后,反向反向培训样本对该区域内各种受扰动的培训点进行高效的抽样,并使用远程识别标签的光滑动机制来捕捉我们的直觉,即不同地点受扰动点对模型性能产生不同影响。关于若干基准数据集的广泛实验显示,区域反向持续极大地改进标准对抗性攻击训练,并展示了更强有力的概括性。