Adversarial training is well-known to produce high-quality neural network models that are empirically robust against adversarial perturbations. Nevertheless, once a model has been adversarially trained, one often desires a certification that the model is truly robust against all future attacks. Unfortunately, when faced with adversarially trained models, all existing approaches have significant trouble making certifications that are strong enough to be practically useful. Linear programming (LP) techniques in particular face a "convex relaxation barrier" that prevent them from making high-quality certifications, even after refinement with mixed-integer linear programming (MILP) techniques, and even when using state-of-the-art computational facilities. In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. The nonconvex relaxation makes strong certifications comparable to much more expensive SDP methods, while optimizing over dramatically fewer variables comparable to much weaker LP methods. Despite nonconvexity, we show how off-the-shelf local optimization algorithms can be used to achieve and to certify global optimality in polynomial time. Our experiments find that the nonconvex relaxation almost completely closes the gap towards exact certification of adversarially trained models.
翻译:众所周知,阿德萨里培训可以产生高质量的神经网络模型,这些模型对对抗性扰动具有很强的经验。然而,一旦模型经过对抗性训练,人们往往希望得到该模型对未来所有攻击真正强大的证明。不幸的是,在面对对抗性训练模型时,所有现有方法都很难使认证足够强大,实际上有用。 线性编程技术尤其面临“康韦克斯放松屏障”,即使经过混合指数线性编程(MILP)技术的改进,甚至在使用最新计算设施时,也无法进行高质量的认证。在本文件中,我们提议采用非康韦克斯认证技术,其基础是低层次限制半定型编程(SDP)放松。非康韦克斯的放松使强有力的认证与更昂贵的SDP方法相近得多,同时优化与更弱得多的LP方法相近得多的变量。尽管不兼容性,但我们展示了如何在经过培训的顶端地方最接近的升级模型中,从而实现和验证我们最先进的全球最优性模型。</s>