This paper provides a comprehensive and exhaustive study of adversarial attacks on human pose estimation models and the evaluation of their robustness. Besides highlighting the important differences between well-studied classification and human pose-estimation systems w.r.t. adversarial attacks, we also provide deep insights into the design choices of pose-estimation systems to shape future work. We benchmark the robustness of several 2D single person pose-estimation architectures trained on multiple datasets, MPII and COCO. In doing so, we also explore the problem of attacking non-classification networks including regression based networks, which has been virtually unexplored in the past. \par We find that compared to classification and semantic segmentation, human pose estimation architectures are relatively robust to adversarial attacks with the single-step attacks being surprisingly ineffective. Our study shows that the heatmap-based pose-estimation models are notably robust than their direct regression-based systems and that the systems which explicitly model anthropomorphic semantics of human body fare better than their other counterparts. Besides, targeted attacks are more difficult to obtain than un-targeted ones and some body-joints are easier to fool than the others. We present visualizations of universal perturbations to facilitate unprecedented insights into their workings on pose-estimation. Additionally, we show them to generalize well across different networks. Finally we perform a user study about perceptibility of these examples.
翻译:本文全面、详尽地研究了对人造面貌的对抗性攻击估计模型,并评估了这些模型的稳健性。除了强调研究周密的分类和人造面貌估计系统之间的重要差异外,我们还深刻地洞察了为塑造未来工作而设计的表面估计系统的设计选择。我们用多个数据集、MPII和COCO培训的2D个人构成估计结构的稳健性作为基准。我们这样做,还探讨了攻击非分类网络的问题,包括过去几乎未探索过的倒退网络。我们发现,与分类和语义估计系统相比,人类构成估计结构相对较强,与对抗性攻击相比,单步攻击极为无效。我们的研究显示,基于热度的2D个人构成估计模型比直接的回归系统、MPII和CO所培训的多个模型的稳健性结构要强。明确为人类身体的人类形态结构模型比其他对应系统要好得多。此外,有针对性的攻击比分类和语义分解的分类和语义化结构结构要难于获得比我们目前更难获得的普通的普通的用户数字。我们更难于了解的系统。我们更难于了解的普通的系统。我们如何研究。