In adversarial machine learning, the popular $\ell_\infty$ threat model has been the focus of much previous work. While this mathematical definition of imperceptibility successfully captures an infinite set of additive image transformations that a model should be robust to, this is only a subset of all transformations which leave the semantic label of an image unchanged. Indeed, previous work also considered robustness to spatial attacks as well as other semantic transformations; however, designing defense methods against the composition of spatial and $\ell_{\infty}$ perturbations remains relatively underexplored. In the following, we improve the understanding of this seldom investigated compositional setting. We prove theoretically that no linear classifier can achieve more than trivial accuracy against a composite adversary in a simple statistical setting, illustrating its difficulty. We then investigate how state-of-the-art $\ell_{\infty}$ defenses can be adapted to this novel threat model and study their performance against compositional attacks. We find that our newly proposed TRADES$_{\text{All}}$ strategy performs the strongest of all. Analyzing its logit's Lipschitz constant for RT transformations of different sizes, we find that TRADES$_{\text{All}}$ remains stable over a wide range of RT transformations with and without $\ell_\infty$ perturbations.
翻译:在对抗性机器学习中,流行的 $ell\\\ infty$ 威胁模型一直是许多先前工作的重点。 虽然对不可感知性的数学定义成功地捕捉了一套无限的添加性图像变异, 模型应该能够坚固, 但这只是所有变异的子集, 使得图像的语义标签没有改变。 事实上, 先前的工作还考虑到对空间攻击和其他语义变换的稳健性; 但是, 设计防御空间攻击的构成和 $\ ell\ infty}$ 突扰的防御方法仍然相对不足。 在下文中, 我们增进了对这一很少调查的构成设置的理解。 我们理论上证明, 在简单的统计环境中, 任何线性分类者都无法在综合对手面前取得比微不足道的准确性更多的东西, 说明其难度。 然后我们调查, 如何对空间攻击和其他语义变的状态进行修改, 并研究它们相对于构成攻击性攻击的性表现。 我们发现, 我们新提出的贸易 ${ all{All} 战略是所有最强的。