Training certifiable neural networks enables one to obtain models with robustness guarantees against adversarial attacks. In this work, we introduce a framework to bound the adversary-free region in the neighborhood of the input data by a polyhedral envelope, which yields finer-grained certified robustness. We further introduce polyhedral envelope regularization (PER) to encourage larger polyhedral envelopes and thus improve the provable robustness of the models. We demonstrate the flexibility and effectiveness of our framework on standard benchmarks; it applies to networks of different architectures and general activation functions. Compared with the state-of-the-art methods, PER has very little computational overhead and better robustness guarantees without over-regularizing the model.
翻译:培训认证的神经网络可以使人们获得具有抵御对抗性攻击的可靠保证的模型。在这项工作中,我们引入了一个框架,将无敌区域与一个多面体信封输入数据相邻的区域捆绑起来,该信息信封能够产生精细的、经认证的稳健性。我们进一步引入了多面体信封正规化(PER),以鼓励更大的多面体信封,从而改进模型的可辨识稳健性。我们展示了我们的标准基准框架的灵活性和有效性;它适用于不同结构的网络和一般激活功能。与最先进的方法相比,PER几乎没有计算间接费用和更好的稳健性保证,而没有将模式过于正规化。