We identify properties of universal adversarial perturbations (UAPs) that distinguish them from standard adversarial perturbations. Specifically, we show that targeted UAPs generated by projected gradient descent exhibit two human-aligned properties: semantic locality and spatial invariance, which standard targeted adversarial perturbations lack. We also demonstrate that UAPs contain significantly less signal for generalization than standard adversarial perturbations -- that is, UAPs leverage non-robust features to a smaller extent than standard adversarial perturbations.
翻译:具体地说,我们表明,预测的梯度下降产生的有针对性的普遍抗争扰动(UAPs)具有两种与人类不相容的特性:语义位置和空间差异,而标准的目标对立扰动缺乏。 我们还表明,普遍抗争扰动(UAPs)的普及信号远远少于标准对抗扰动(UAPs),即统一抗争扰动(UAPs)的影响力小于标准的非抗争扰动(Orbust)特征。