Sparse adversarial attacks fool deep neural networks (DNNs) through minimal pixel perturbations, often regularized by the $\ell_0$ norm. Recent efforts have replaced this norm with a structural sparsity regularizer, such as the nuclear group norm, to craft group-wise sparse adversarial attacks. The resulting perturbations are thus explainable and hold significant practical relevance, shedding light on an even greater vulnerability of DNNs. However, crafting such attacks poses an optimization challenge, as it involves computing norms for groups of pixels within a non-convex objective. We address this by presenting a two-phase algorithm that generates group-wise sparse attacks within semantically meaningful areas of an image. Initially, we optimize a quasinorm adversarial loss using the $1/2-$quasinorm proximal operator tailored for non-convex programming. Subsequently, the algorithm transitions to a projected Nesterov's accelerated gradient descent with $2-$norm regularization applied to perturbation magnitudes. Rigorous evaluations on CIFAR-10 and ImageNet datasets demonstrate a remarkable increase in group-wise sparsity, e.g., $50.9\%$ on CIFAR-10 and $38.4\%$ on ImageNet (average case, targeted attack). This performance improvement is accompanied by significantly faster computation times, improved explainability, and a $100\%$ attack success rate.
翻译:稀疏对抗攻击通过最小像素扰动欺骗深度神经网络(DNNs),通常采用 $\\ell_0$ 范数进行正则化。近期研究采用结构化稀疏正则化器(如核群范数)替代该范数,以构建组稀疏对抗攻击。由此产生的扰动具有可解释性且具备重要实际意义,揭示了DNNs更显著的脆弱性。然而,构建此类攻击面临优化挑战,因其涉及在非凸目标函数中计算像素组的范数。我们提出一种两阶段算法来解决该问题,该算法能在图像的语义相关区域内生成组稀疏攻击。首先,我们使用专为非凸规划设计的 $1/2-$ 拟范数近端算子优化拟范数对抗损失;随后,算法转换为投影Nesterov加速梯度下降法,并对扰动幅度施加 $2-$ 范数正则化。在CIFAR-10和ImageNet数据集上的严格评估表明,组稀疏性显著提升(例如CIFAR-10达 $50.9\\%$,ImageNet达 $38.4\\%$(平均情况,定向攻击))。该性能提升同时伴随计算时间大幅缩短、可解释性增强以及 $100\\%$ 的攻击成功率。