Many state-of-the-art adversarial training methods for deep learning leverage upper bounds of the adversarial loss to provide security guarantees against adversarial attacks. Yet, these methods rely on convex relaxations to propagate lower and upper bounds for intermediate layers, which affect the tightness of the bound at the output layer. We introduce a new approach to adversarial training by minimizing an upper bound of the adversarial loss that is based on a holistic expansion of the network instead of separate bounds for each layer. This bound is facilitated by state-of-the-art tools from Robust Optimization; it has closed-form and can be effectively trained using backpropagation. We derive two new methods with the proposed approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from Linear Robust Optimization to obtain an empirical upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes a provable upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations.
翻译:鲁棒性很强的敌对训练的上限
许多当前在深度学习的敌对训练中使用的方法,利用敌对损失的上限来提供针对敌对攻击的安全保障。然而,这些方法依赖于凸松弛以传播中间层的下界和上界,这些界限影响了输出层的紧密程度。我们引入了一种新的敌对训练方法,通过最小化一个基于网络整体扩展而非每个层的分别界定的敌对损失上限来实现。这个上限利用了鲁棒优化领域的最先进工具,具有封闭形态并可以使用反向传播有效地进行训练。我们开发了两种使用提出的方法的新方法。第一种方法(Approximated Robust Upper Bound,简称aRUB)利用网络的一阶逼近以及来自线性鲁棒优化的基础工具,获得可以轻松实现的敌对损失上限的经验上限。第二种方法(Robust Upper Bound,简称RUB)计算敌对损失的可证明上限。我们在各种表格和视觉数据集中展示了我们的方法的有效性——在更大的扰动下,RUB比最先进的方法更加鲁棒,而aRUB在小幅扰动下的性能与最先进的方法相当。