Defending deep neural networks against adversarial examples is a key challenge for AI safety. To improve the robustness effectively, recent methods focus on important data points near the decision boundary in adversarial training. However, these methods are vulnerable to Auto-Attack, which is an ensemble of parameter-free attacks for reliable evaluation. In this paper, we experimentally investigate the causes of their vulnerability and find that existing methods reduce margins between logits for the true label and the other labels while keeping their gradient norms non-small values. Reduced margins and non-small gradient norms cause their vulnerability since the largest logit can be easily flipped by the perturbation. Our experiments also show that the histogram of the logit margins has two peaks, i.e., small and large logit margins. From the observations, we propose switching one-versus-the-rest loss (SOVR), which uses one-versus-the-rest loss when data have small logit margins so that it increases the margins. We find that SOVR increases logit margins more than existing methods while keeping gradient norms small and outperforms them in terms of the robustness against Auto-Attack.
翻译:保护深度神经网络对抗对抗性实例是AI 安全的关键挑战。 为了有效提高稳健性,最近的方法侧重于在对抗性训练中靠近决定边界的重要数据点。 但是,这些方法很容易被自动对齐, 这是一种无参数攻击的组合, 以便进行可靠的评估。 在本文中, 我们实验性地调查其脆弱性的原因, 并发现现有方法会减少真实标签和其他标签的日志之间的边距, 同时又保持其渐变标准非小值。 降低边距和非小梯度规范导致其脆弱性, 因为最大的日志可以很容易被扰动。 我们的实验还显示, 日志边距的直方图有两个峰值, 即小的和大的日边距。 根据观察, 我们提议在数据有小的日边距时, 将一反向损失( SOVR) 转换为一反向的边距, 从而增加边距。 我们发现 SOVR 增加的日边距比现有方法要大得多, 同时保持梯度规范小, 并超越自动法的坚固性。