Adversarial training is a promising method to improve the robustness against adversarial attacks. To enhance its performance, recent methods impose high weights on the cross-entropy loss for important data points near the decision boundary. However, these importance-aware methods are vulnerable to sophisticated attacks, e.g., Auto-Attack. In this paper, we experimentally investigate the cause of their vulnerability via margins between logits for the true label and the other labels because they should be large enough to prevent the largest logit from being flipped by the attacks. Our experiments reveal that the histogram of the logit margins of na\"ive adversarial training has two peaks. Thus, the levels of difficulty in increasing logit margins are roughly divided into two: difficult samples (small logit margins) and easy samples (large logit margins). On the other hand, only one peak near zero appears in the histogram of importance-aware methods, i.e., they reduce the logit margins of easy samples. To increase logit margins of difficult samples without reducing those of easy samples, we propose switching one-versus-the-rest loss (SOVR), which switches from cross-entropy to one-versus-the-rest loss (OVR) for difficult samples. We derive trajectories of logit margins for a simple problem and prove that OVR increases logit margins two times larger than the weighted cross-entropy loss. Thus, SOVR increases logit margins of difficult samples, unlike existing methods. We experimentally show that SOVR achieves better robustness against Auto-Attack than importance-aware methods.
翻译:Adversarial 培训是一种很有希望的方法,可以提高对抗性攻击的稳健性。 为了提高其性能, 最近的方法对在决定边界附近的重要数据点的跨渗透性损失施加了很高的权重。 然而, 这些重要觉悟方法很容易受到复杂的攻击, 例如 Auto- Attack 。 在本文中, 我们实验地调查了他们易受攻击的原因, 因为他们在真实标签和其他标签的对称之间, 因为它们应该足够大, 以防止最大的对称在攻击中被翻转动。 我们的实验显示, NA\" 对抗性对称训练的对称边线性差的直线图有两个峰值。 因此, 增加日志边差的困难程度大致分为两种: 困难的样本( 小滑行边) 和容易的样本( 大滑行幅度 ) 。 另一方面, 在重度方法的直线图中, 仅出现一个接近零的峰值, 也就是说, 简单样品的对准度差值差。 为了增加现有的对称的对称, 容易的对称的对称的对称的对等值的对称, 我们的对质- R- 的对数值的对数值的对数值的对数值的对数值的对数值的对数值的对数值的对比对数值的对数值的对数值的对数值的对数值的对数值的对数值的对数值的对数值的对数值的对数值的对数值的对数值的对一个的对一个的对一个的对一个的对一个的对一个的对一个的对数值的对数值的对一个的对一个的对数值的对一个的对一个的对数值的对数值的对一个的对数值的对一个的对一个的对一个的对一个的对一个的对一个的对一个的对数值的对一个的对比对比对一个的对一个的对一个的对数值的对一个的对一个的对比对比对一个的对一个的对数值的对数值的对一个的对比对一个对一个的对一个的对数值的对数值的对一个的对一个的对的对路的对路的对的对的对的对路的对路的对路的对的对路的对路的对一个的对路的对路的对一个的对