Deep neural networks are susceptible to adversarial attacks due to the accumulation of perturbations in the feature level, and numerous works have boosted model robustness by deactivating the non-robust feature activations that cause model mispredictions. However, we claim that these malicious activations still contain discriminative cues and that with recalibration, they can capture additional useful information for correct model predictions. To this end, we propose a novel, easy-to-plugin approach named Feature Separation and Recalibration (FSR) that recalibrates the malicious, non-robust activations for more robust feature maps through Separation and Recalibration. The Separation part disentangles the input feature map into the robust feature with activations that help the model make correct predictions and the non-robust feature with activations that are responsible for model mispredictions upon adversarial attack. The Recalibration part then adjusts the non-robust activations to restore the potentially useful cues for model predictions. Extensive experiments verify the superiority of FSR compared to traditional deactivation techniques and demonstrate that it improves the robustness of existing adversarial training methods by up to 8.57% with small computational overhead. Codes are available at https://github.com/wkim97/FSR.
翻译:恶意鲁棒性特征分离与重新校准
深度神经网络容易受到恶意攻击,因为攻击会在特征层面上积累扰动,许多研究通过禁用导致模型预测错误的非鲁棒特征激活来增强模型的鲁棒性。然而,我们认为这些恶意激活仍然包含识别提示,并且通过重新校准,它们可以捕获用于正确模型预测的额外有用信息。为此,我们提出了一种名为“特征分离和重新校准”(FSR)的新方法,通过分离和重新校准来重新校准恶意、非鲁棒激活以获得更加鲁棒的特征映射。分离部分把输入特征映射分离成带有有助于模型做出正确预测的鲁棒特征激活和导致恶意攻击时预测错误的非鲁棒特征激活。重新校准部分会调整非鲁棒活跃度,以恢复潜在有用的提示来进行模型预测。广泛的实验验证了 FSR 相对于传统的关闭技术的优越性,并证明它通过很小的计算开销提高现有对抗性训练方法高达 8.57%的鲁棒性。 代码可在 https://github.com/wkim97/FSR 获取。