Deep neural networks are highly vulnerable to adversarial examples, which are inputs with small, carefully crafted perturbations that cause misclassification -- making adversarial attacks a critical tool for evaluating robustness. Existing black-box methods typically entail a trade-off between precision and flexibility: pixel-sparse attacks (e.g., single- or few-pixel attacks) provide fine-grained control but lack adaptability, whereas patch- or frequency-based attacks improve efficiency or transferability, but at the cost of producing larger and less precise perturbations. We present GreedyPixel, a fine-grained black-box attack method that performs brute-force-style, per-pixel greedy optimization guided by a surrogate-derived priority map and refined by means of query feedback. It evaluates each coordinate directly without any gradient information, guaranteeing monotonic loss reduction and convergence to a coordinate-wise optimum, while also yielding near white-box-level precision and pixel-wise sparsity and perceptual quality. On the CIFAR-10 and ImageNet datasets, spanning convolutional neural networks (CNNs) and Transformer models, GreedyPixel achieved state-of-the-art success rates with visually imperceptible perturbations, effectively bridging the gap between black-box practicality and white-box performance. The implementation is available at https://github.com/azrealwang/greedypixel.
翻译:深度神经网络极易受到对抗样本的攻击,这些样本是经过精心设计的微小扰动输入,会导致分类错误——使得对抗攻击成为评估模型鲁棒性的关键工具。现有的黑盒方法通常在精度与灵活性之间进行权衡:像素稀疏攻击(例如单像素或少数像素攻击)提供细粒度控制但缺乏适应性,而基于图像块或频率的攻击虽然提高了效率或可迁移性,但以产生更大且精度较低的扰动为代价。我们提出了GreedyPixel,一种细粒度的黑盒攻击方法,通过代理模型生成的优先级图引导,结合查询反馈进行优化,执行类暴力搜索的逐像素贪心优化。该方法无需任何梯度信息直接评估每个坐标,保证损失单调下降并收敛至坐标级最优解,同时实现接近白盒水平的精度、像素级稀疏性及感知质量。在CIFAR-10和ImageNet数据集上,涵盖卷积神经网络(CNN)和Transformer模型,GreedyPixel以视觉不可察觉的扰动取得了最先进的攻击成功率,有效弥合了黑盒实用性与白盒性能之间的差距。代码实现发布于https://github.com/azrealwang/greedypixel。