We propose a principled framework that combines adversarial training and provable robustness verification for training certifiably robust neural networks. We formulate the training problem as a joint optimization problem with both empirical and provable robustness objectives and develop a novel gradient-descent technique that can eliminate bias in stochastic multi-gradients. We perform both theoretical analysis on the convergence of the proposed technique and experimental comparison with state-of-the-arts. Results on MNIST and CIFAR-10 show that our method can consistently match or outperform prior approaches for provable l infinity robustness. Notably, we achieve 6.60% verified test error on MNIST at epsilon = 0.3, and 66.57% on CIFAR-10 with epsilon = 8/255.
翻译:我们提出了一个原则性框架,将对抗性培训和可验证的可靠神经网络培训的可靠强力核查结合起来。我们把培训问题作为联合优化问题,同时提出实证性和可验证的稳健性目标,并开发一种新的梯度-白种技术,以消除在随机多级技术中的偏差。我们从理论角度分析了拟议技术的趋同情况,并与最新技术进行了实验性比较。关于MNIST和CIFAR-10的结果表明,我们的方法可以始终匹配或优于先前的可验证的无限强力方法。值得注意的是,我们在epsilon = 0.3和 CIFAR-10 和 Epsilon = 8/255上实现了6.60%经核实的MNIST测试错误。