We consider adversarial training of deep neural networks through the lens of Bayesian learning, and present a principled framework for adversarial training of Bayesian Neural Networks (BNNs) with certifiable guarantees. We rely on techniques from constraint relaxation of non-convex optimisation problems and modify the standard cross-entropy error model to enforce posterior robustness to worst-case perturbations in $\epsilon$-balls around input points. We illustrate how the resulting framework can be combined with methods commonly employed for approximate inference of BNNs. In an empirical investigation, we demonstrate that the presented approach enables training of certifiably robust models on MNIST, FashionMNIST and CIFAR-10 and can also be beneficial for uncertainty calibration. Our method is the first to directly train certifiable BNNs, thus facilitating their deployment in safety-critical applications.
翻译:我们从贝叶斯学习的角度考虑对深神经网络进行对抗性培训,为巴伊西亚神经网络的对抗性培训提供一个原则框架,提供可验证的保障;我们依靠限制性技术,缓解非康威克斯优化问题,并修改标准的跨热带错误模式,将后身强力强力强力强力强力强力强力强力强力强力强到输入点周围最坏情况的扰动;我们说明如何将由此产生的框架与通常用于大致推断巴伊西亚神经网络的方法相结合;在一项实证调查中,我们证明,所提出的方法能够培训可验证的多国空间信息系统、时尚MINIS和CIFAR-10型模型,也有利于不确定性校准,我们的方法是直接培训可验证的班尼特球,从而便利在安全关键应用中部署这些模型。