Deep neural networks can be fooled by adversarial attacks: adding carefully computed small adversarial perturbations to clean inputs can cause misclassification on state-of-the-art machine learning models. The reason is that neural networks fail to accommodate the distribution drift of the input data caused by adversarial perturbations. Here, we present a new solution - Beneficial Perturbation Network (BPN) - to defend against adversarial attacks by fixing the distribution drift. During training, BPN generates and leverages beneficial perturbations (somewhat opposite to well-known adversarial perturbations) by adding new, out-of-network biasing units. Biasing units influence the parameter space of the network, to preempt and neutralize future adversarial perturbations on input data samples. To achieve this, BPN creates reverse adversarial attacks during training, with very little cost, by recycling the training gradients already computed. Reverse attacks are captured by the biasing units, and the biases can in turn effectively defend against future adversarial examples. Reverse attacks are a shortcut, i.e., they affect the network's parameters without requiring instantiation of adversarial examples that could assist training. We provide comprehensive empirical evidence showing that 1) BPN is robust to adversarial examples and is much more running memory and computationally efficient compared to classical adversarial training. 2) BPN can defend against adversarial examples with negligible additional computation and parameter costs compared to training only on clean examples; 3) BPN hurts the accuracy on clean examples much less than classic adversarial training; 4) BPN can improve the generalization of the network 5) BPN trained only with Fast Gradient Sign Attack can generalize to defend PGD attacks.
翻译:深心神经网络可以被对抗性攻击所蒙骗: 将精心计算的小对抗性对称扰动添加到清洁投入中, 可能会导致对最新机器学习模型的错误分类。 原因是神经网络无法容纳由对抗性扰动引起的输入数据流流流。 在这里, 我们提出了一个新的解决方案 — 福利性对冲网络( BPN ) — 通过修正分布流来防范对抗性攻击。 在培训期间, BPN 生成并使用有利的对称干扰( 与众所周知的对抗性对称扰动相反 ), 通过添加新的、 网络外的对称偏差模型, 可能导致对网络的参数空间产生偏差。 双心网络影响未来的对称扰动性数据样本。 为达到这个目的, BPP PP 通过回收已经计算过的反向对称攻击。 逆性攻击被偏差单位捕捉到, 而偏向未来对正对称的对称只能有效地防御性攻击。 逆性攻击是一条捷径, 也就是说, 网络的对正对正性 RP 5 参数, 它们会影响网络的参数, 正在比较的对比 B 的 B 的对等性模型, 提供大量的对等性模型,, 能够提供大量的对比 提供 提供我们所训练的对等性 。