Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrained to a subspace $V$ in the ambient input space $\mathcal{X}$. Motivated by this, in this work we investigate how adversarial vulnerability depends on $\dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $\ell^p$ norm constraints behaves like a monotonically increasing function of $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ where $\epsilon$ is the perturbation budget and $\frac{1}{p} + \frac{1}{q} =1$, provided $p > 1$ (the case $p=1$ presents additional subtleties which we analyze in some detail). This functional form can be easily derived from a simple toy linear model, and as such our results land further credence to arguments that adversarial examples are endemic to locally linear models on high dimensional spaces.
翻译:过去探索对抗性漏洞的工作都集中在攻击者可以扰动模型输入的所有维度的情况下。另一方面,最近的一系列研究考虑了以下两种情况:(i)攻击者可以扰动有限数量的输入参数,或者(ii)在多模问题中的一些模态。在这两种情况下,对抗样本被有效地限制在环境输入空间$ \mathcal{X} $中的子空间$ V $中。受此启发,本文探讨了对抗性漏洞如何取决于$ \dim(V)$。特别是,我们展示了在$ p>1 $的情况下,具有$ \ell ^ p $范数约束的标准PGD攻击的对抗性成功表现为关于$ \epsilon(\frac{\dim(V)}{\dim\mathcal{X}})^{\frac {1} {q}}$单调递增的函数,其中$ \epsilon $是扰动预算,$ \frac{1} {p} + \frac{1} {q} = 1$ 并且$ \frac {1} {p} + \frac {1} {q} = 1 $。这个函数形式可以从一个简单的玩具线性模型中轻松推导出来,因此我们的结果进一步证明了对抗性样本在高维空间上的局部线性模型中是普遍存在的观点。