We show that when taking into account also the image domain $[0,1]^d$, established $l_1$-projected gradient descent (PGD) attacks are suboptimal as they do not consider that the effective threat model is the intersection of the $l_1$-ball and $[0,1]^d$. We study the expected sparsity of the steepest descent step for this effective threat model and show that the exact projection onto this set is computationally feasible and yields better performance. Moreover, we propose an adaptive form of PGD which is highly effective even with a small budget of iterations. Our resulting $l_1$-APGD is a strong white-box attack showing that prior works overestimated their $l_1$-robustness. Using $l_1$-APGD for adversarial training we get a robust classifier with SOTA $l_1$-robustness. Finally, we combine $l_1$-APGD and an adaptation of the Square Attack to $l_1$ into $l_1$-AutoAttack, an ensemble of attacks which reliably assesses adversarial robustness for the threat model of $l_1$-ball intersected with $[0,1]^d$.
翻译:我们发现,如果考虑到图像域 $[0,1,1美元, 固定的1美元梯度下降(PGD)攻击(PGD)的设定值是低的,因为他们并不认为有效的威胁模式是1美元球和$[0,1,1美元]的交叉点。我们研究了这一有效威胁模式下最陡峭的下降步骤的预期宽度,并发现,在这个套件上准确的投影是计算上可行的,并产生更好的性能。此外,我们提议了一种适应性的PGD形式,这种形式即使有少量的迭代预算也非常有效。我们产生的$1美元-APGD是一个强烈的白箱攻击,表明先前的工程高估了$1美元-robustrity。我们用$1美元-APGD来进行对抗性训练,我们得到了一个强大的分类器,SOTA $_1美元-robtn。最后,我们把1美元-APGD和广场攻击调整为$1美元-Aut-Atack的调整为$1美元,一个用于可靠地评估攻击威胁的磁性模型。