In this paper, we initiate a rigorous study of the phenomenon of low-dimensional adversarial perturbations (LDAPs) in classification. Unlike the classical setting, these perturbations are limited to a subspace of dimension $k$ which is much smaller than the dimension $d$ of the feature space. The case $k=1$ corresponds to so-called universal adversarial perturbations (UAPs; Moosavi-Dezfooli et al., 2017). First, we consider binary classifiers under generic regularity conditions (including ReLU networks) and compute analytical lower-bounds for the fooling rate of any subspace. These bounds explicitly highlight the dependence of the fooling rate on the pointwise margin of the model (i.e., the ratio of the output to its $L_2$ norm of its gradient at a test point), and on the alignment of the given subspace with the gradients of the model w.r.t. inputs. Our results provide a rigorous explanation for the recent success of heuristic methods for efficiently generating low-dimensional adversarial perturbations. Finally, we show that if a decision-region is compact, then it admits a universal adversarial perturbation with $L_2$ norm which is $\sqrt{d}$ times smaller than the typical $L_2$ norm of a data point. Our theoretical results are confirmed by experiments on both synthetic and real data.
翻译:在本文中,我们开始对低维对抗性扰动(LDAPs)的分类现象进行严格研究。 与古典环境不同, 这些扰动仅限于一个维度的亚空间, 美元比地表空间的维度小得多。 案例 $k= 1美元相当于所谓的通用对抗性扰动( UAPs; Moosavi- Dezfooli 等人, 2017年)。 首先, 我们考虑在通用的合成常规条件下( 包括 ReLU 网络) 和计算任何子空间的愚昧率的分析低限。 这些约束明确强调了在模型的基点边距( 也就是说, 输出与其在测试点的梯度值的0.2美元标准之比之比) 上, 以及将给定的次空间与模型 w.r. t. 投入的梯度相匹配。 我们的结果为当时有效生成低维度的相对值相对值相对值相对值相对值相对值的相对值的典型的超度方法的成功提供了严格的解释。 最后, 如果一个决定就是一个区域, 我们的正标值, 我们的相对值数据,, 我们的正标值是 一个区域, 我们的 接受一个比 。