The robustness of deep neural networks (DNNs) against adversarial example attacks has raised wide attention. For smoothed classifiers, we propose the worst-case adversarial loss over input distributions as a robustness certificate. Compared with previous certificates, our certificate better describes the empirical performance of the smoothed classifiers. By exploiting duality and the smoothness property, we provide an easy-to-compute upper bound as a surrogate for the certificate. We adopt a noisy adversarial learning procedure to minimize the surrogate loss to improve model robustness. We show that our training method provides a theoretically tighter bound over the distributional robust base classifiers. Experiments on a variety of datasets further demonstrate superior robustness performance of our method over the state-of-the-art certified or heuristic methods.
翻译:深度神经网络(DNN)对对抗性实例攻击的坚韧性引起了广泛的关注。 对于平滑的分类者,我们建议对输入分布的最坏情况对抗性损失作为稳健性证书。与以前的证书相比,我们的证书更好地描述了平滑分类者的经验性表现。通过利用双重性和平稳性,我们提供了一个容易计算的上层线作为证书的代金。我们采用了吵闹的对抗性学习程序,以尽量减少代金损失,从而提高模型的稳健性。我们表明,我们的培训方法在理论上比分配性强的基础分类者更加严格。对各种数据集的实验进一步表明,我们的方法比最先进的经认证的或超脂质的方法更加稳健。