提高实力的自我强化自力培训 (Self-Ensemble Adversarial Training for Improved Robustness)

Due to numerous breakthroughs in real-world applications brought by machine intelligence, deep neural networks (DNNs) are widely employed in critical applications. However, predictions of DNNs are easily manipulated with imperceptible adversarial perturbations, which impedes the further deployment of DNNs and may result in profound security and privacy implications. By incorporating adversarial samples into the training data pool, adversarial training is the strongest principled strategy against various adversarial attacks among all sorts of defense methods. Recent works mainly focus on developing new loss functions or regularizers, attempting to find the unique optimal point in the weight space. But none of them taps the potentials of classifiers obtained from standard adversarial training, especially states on the searching trajectory of training. In this work, we are dedicated to the weight states of models through the training process and devise a simple but powerful \emph{Self-Ensemble Adversarial Training} (SEAT) method for yielding a robust classifier by averaging weights of history models. This considerably improves the robustness of the target model against several well known adversarial attacks, even merely utilizing the naive cross-entropy loss to supervise. We also discuss the relationship between the ensemble of predictions from different adversarially trained models and the prediction of weight-ensembled models, as well as provide theoretical and empirical evidence that the proposed self-ensemble method provides a smoother loss landscape and better robustness than both individual models and the ensemble of predictions from different classifiers. We further analyze a subtle but fatal issue in the general settings for the self-ensemble model, which causes the deterioration of the weight-ensembled method in the late phases.

翻译：由于机器情报带来的现实世界应用方面的许多突破,深神经网络(DNNs)在关键应用中被广泛采用,但DNNs的预测很容易被无法察觉的对抗性干扰操纵,这妨碍了DNS的进一步部署,并可能造成深刻的安全和隐私影响。通过将对抗性样本纳入培训数据库,对抗性培训是针对各种防御方法之间各种对抗性攻击的最强有力的原则战略。最近的工作主要侧重于开发新的损失功能或规范,试图在重量空间中找到独特的最佳点。但其中没有一个利用标准对抗性培训中获得的分类员的潜力,特别是搜索培训轨迹上的状态。在这项工作中,我们致力于通过培训进程处理模型的重量状况,并设计一个简单而有力的自我强化的模型,通过平均历史模型的重量来产生更强的分类器。这大大改进了目标模型对若干已知的对抗性攻击的稳健性,甚至只是利用了标准对抗性对抗性培训轨迹的轨迹。我们只是利用了标准跨度的轨迹的轨迹分析方法来分析各种模型的重量,我们从经过了更精确的测测算方法,从而提供了一种更精确的自我测算模型。