Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrained to a subspace $V$ in the ambient input space $\mathcal{X}$. Motivated by this, in this work we investigate how adversarial vulnerability depends on $\dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $\ell^p$ norm constraints behaves like a monotonically increasing function of $\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ where $\epsilon$ is the perturbation budget and $\frac{1}{p} + \frac{1}{q} =1$, provided $p > 1$ (the case $p=1$ presents additional subtleties which we analyze in some detail). This functional form can be easily derived from a simple toy linear model, and as such our results land further credence to arguments that adversarial examples are endemic to locally linear models on high dimensional spaces.
翻译:过去研究对抗性脆弱性主要集中在敌手可以干扰模型输入中的所有维度的情况。另一方面,最近的一系列研究考虑了敌手可以干扰有限数量的输入参数或多模问题中的一部分模态的情况。在这两种情况下,对抗性示例实际上被限制在环境输入空间$\mathcal{X}$中的子空间$V$内。基于此,本文研究了对抗脆弱性如何取决于$\dim(V)$。具体而言,我们展示了标准的PGD攻击的对抗成功率随$\epsilon (\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$单调递增,并且$p > 1$,其中$\epsilon$是扰动预算,$\frac{1}{p} + \frac{1}{q} =1$,如果$p=1$,则存在额外的细节分析。这个函数形式可以轻松地从一个简单的线性模型中推导出来,因此我们的结果进一步证明了对抗样本是高维空间中的本地线性模型的流行论点。