Semantic Image Attack用于视觉模型诊断 (Semantic Image Attack for Visual Model Diagnosis)

In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models. This is partially due to the fact that obtaining a balanced, diverse, and perfectly labeled dataset is typically expensive, time-consuming, and error-prone. Rather than relying on a carefully designed test set to assess ML models' failures, fairness, or robustness, this paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images to allow model diagnosis, interpretability, and robustness. Traditional adversarial training is a popular methodology for robustifying ML models against attacks. However, existing adversarial methods do not combine the two aspects that enable the interpretation and analysis of the model's flaws: semantic traceability and perceptual quality. SIA combines the two features via iterative gradient ascent on a predefined semantic attribute space and the image space. We illustrate the validity of our approach in three scenarios for keypoint detection and classification. (1) Model diagnosis: SIA generates a histogram of attributes that highlights the semantic vulnerability of the ML model (i.e., attributes that make the model fail). (2) Stronger attacks: SIA generates adversarial examples with visually interpretable attributes that lead to higher attack success rates than baseline methods. The adversarial training on SIA improves the transferable robustness across different gradient-based attacks. (3) Robustness to imbalanced datasets: we use SIA to augment the underrepresented classes, which outperforms strong augmentation and re-balancing baselines.

翻译：在实践中，针对特定的训练集和测试集进行的度量分析并不能保证可靠或公平的ML模型。这部分是由于获得平衡、多样性和完美标记的数据集通常是昂贵、耗时和容易出错的。本文提出了基于敌对攻击的语义图像攻击（SIA）方法，以提供语义对抗图像，以实现模型诊断、可解释性和健壮性。传统的敌对训练是一种用于增强ML模型对攻击的鲁棒性的流行方法。然而，现有的敌对方法没有结合能够实现对模型缺陷解释和分析的两个方面：语义可追溯性和感知质量。SIA通过在预定义的语义属性空间和图像空间上进行迭代梯度上升，结合了两个特征。我们在关键点检测和分类的三种场景下说明了我们方法的有效性。（1）模型诊断：SIA生成了一个属性直方图，突出了ML模型的语义漏洞（即使模型失败的属性）。（2）更强的攻击：SIA生成具有可视化属性的对抗性示例，其攻击成功率高于基线方法。在SIA上的敌对训练可以提高不同基于梯度的攻击之间的可传递的健壮性。（3）对不平衡数据集的健壮性：我们使用SIA增强了下采样类的数据，其表现优于强大的增强和重新平衡的基线。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【AAAI 2022】机器学习模型的解释方法效果如何？MIT、微软学者为你解读，Do Feature Attribution Methods Correctly Attribute Features?

专知会员服务

31+阅读 · 2022年3月12日

【CVPR 2022】可转移的稀疏对抗性攻击，Transferable Sparse Adversarial Attack

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日