Availability attacks, which poison the training data with imperceptible perturbations, can make the data \emph{not exploitable} by machine learning algorithms so as to prevent unauthorized use of data. In this work, we investigate why these perturbations work in principle. We are the first to unveil an important population property of the perturbations of these attacks: they are almost \textbf{linearly separable} when assigned with the target labels of the corresponding samples, which hence can work as \emph{shortcuts} for the learning objective. We further verify that linear separability is indeed the workhorse for availability attacks. We synthesize linearly-separable perturbations as attacks and show that they are as powerful as the deliberately crafted attacks. Moreover, such synthetic perturbations are much easier to generate. For example, previous attacks need dozens of hours to generate perturbations for ImageNet while our algorithm only needs several seconds. Our finding also suggests that the \emph{shortcut learning} is more widely present than previously believed as deep models would rely on shortcuts even if they are of an imperceptible scale and mixed together with the normal features. Our source code is published at \url{https://github.com/dayu11/Availability-Attacks-Create-Shortcuts}.
翻译:在这项工作中,我们调查了这些扰动原则上为何起作用。我们首先披露了这些袭击扰动的重要人口属性:当与相应样本的目标标签一起分配时,它们几乎是\ textbf{线性separble},因此可以作为学习目标的计算方法发挥作用。我们进一步核实,线性分离确实是可用攻击的工马。我们将这些扰动作为攻击的线性分离,并表明它们与蓄意制造的攻击一样强大。此外,合成扰动更容易产生。例如,在分配相应样本的目标标签时,它们几乎是 \ textbf{线性分离。因此,它们可以作为学习目标。我们的调查还表明,线性分离确实是可用攻击的工马。11 与以前相信的直线性分离的扰动性扰动性攻击一样广泛。此外,合成的扰动性扰动性攻击更容易产生。例如,以前的攻击需要数十小时来为图像网络产生扰动性,而我们的算法只需要几秒钟。我们发现,直线性分离性(emph{shortcutcutcrodu) 11比以前相信的快速模型要更加广泛。