We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step toward a basis for a principled method of adversarially training BNNs. Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 datasets.
翻译:我们提出了一种新的算法,以学习一种在对抗性攻击下强大的深神经网络模型。以前的算法展示了一种经过对抗性训练的巴伊西亚神经网络(BNN)能够提高稳健性。我们承认,一种巴伊西亚模式的多模式后背分布可能会导致模式崩溃;因此,该模式在稳健性和性能方面的成就不尽理想。相反,我们首先建议预防模式崩溃,以更好地接近多模式后背分布。第二,基于强势模型应当忽略扰动,而只考虑投入内容的丰富内容的直觉,我们构思和制定一种信息获取目标,以衡量和强制实施从良性和对抗性培训中汲取的信息。重要的是,我们证明并证明,尽可能减少该信息获取目标使对抗性风险能够接近常规经验风险。我们相信,我们的努力为对立性培训BNNW的原则性方法提供了基础。我们的模式表明,强性强度显著提高至20 %,与对抗性培训相比,与对立性培训和对立性培训的Adv-BNMAR攻击下的IP-10-10L数据相比,在CIGGM VI-10-10-10MA下大大改进了。