Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems. Especially, deep neural network (DNN) methods have significantly reduced estimation errors for crowd counting missions. Recent studies have demonstrated that DNNs are vulnerable to adversarial attacks, i.e., normal images with human-imperceptible perturbations could mislead DNNs to make false predictions. In this work, we propose a robust attack strategy called Adversarial Patch Attack with Momentum (APAM) to systematically evaluate the robustness of crowd counting models, where the attacker's goal is to create an adversarial perturbation that severely degrades their performances, thus leading to public safety accidents (e.g., stampede accidents). Especially, the proposed attack leverages the extreme-density background information of input images to generate robust adversarial patches via a series of transformations (e.g., interpolation, rotation, etc.). We observe that by perturbing less than 6\% of image pixels, our attacks severely degrade the performance of crowd counting systems, both digitally and physically. To better enhance the adversarial robustness of crowd counting models, we propose the first regression model-based Randomized Ablation (RA), which is more sufficient than Adversarial Training (ADT) (Mean Absolute Error of RA is 5 lower than ADT on clean samples and 30 lower than ADT on adversarial examples). Extensive experiments on five crowd counting models demonstrate the effectiveness and generality of the proposed method. The supplementary materials and certificate retrained models are available at \url{https://www.dropbox.com/s/hc4fdx133vht0qb/ACM_MM2021_Supp.pdf?dl=0}
翻译:人群计数因其在安全临界监视系统中的重要性而引起人们的极大关注。 特别是, 深神经网络( DNN) 方法大大降低了人群计数任务的估计误差。 最近的研究显示, DNN 很容易受到对抗性攻击, 也就是说, 正常的图像, 人类无法察觉的扰动会误导 DNN 做出虚假的预测。 在这项工作中, 我们提议了一个名为 Aversarial Patch Attack of Moment( APAM) 的强有力的攻击战略, 以系统评价人群计数模型的稳健性, 攻击者的目标是制造一种对抗性侵扰动性穿透性穿透性, 严重地降低其性能, 从而导致公共安全事故( 例如, 印记性事故事故 ) 。 特别是, 拟议的攻击利用输入图像的极端密度背景信息来通过一系列变异异( 例如, 内插、 旋转等) 。 我们发现, 图像计数的精确性计算模型比 6 ⁇ 低, 我们的攻击严重地降低了 人计数系统( ), 数字- 和物理上, 机变压性模型 。