Vision Transformers (ViTs) are becoming a very popular paradigm for vision tasks as they achieve state-of-the-art performance on image classification. However, although early works implied that this network structure had increased robustness against adversarial attacks, some works argue ViTs are still vulnerable. This paper presents our first attempt toward detecting adversarial attacks during inference time using the network's input and outputs as well as latent features. We design four quantifications (or derivatives) of input, output, and latent vectors of ViT-based models that provide a signature of the inference, which could be beneficial for the attack detection, and empirically study their behavior over clean samples and adversarial samples. The results demonstrate that the quantifications from input (images) and output (posterior probabilities) are promising for distinguishing clean and adversarial samples, while latent vectors offer less discriminative power, though they give some insights on how adversarial perturbations work.
翻译:视觉变异器(VIT)随着在图像分类方面达到最先进的业绩,正在成为一个非常受欢迎的愿景任务范例。然而,虽然早期的工程意味着网络结构增强了抵御对抗性攻击的力度,但一些工程认为VIT仍然脆弱。本文介绍了我们首次尝试利用网络的投入和产出以及潜在特征在推论期间发现对抗性攻击。我们设计了四种VIT模型的投入、产出和潜在矢量的量化(或衍生物),这些模型提供了推论的标志,有利于攻击探测,并用经验研究其对清洁样品和对抗性样品的行为。结果显示,投入(image)和产出(其他概率)的量化对于区分清洁和对抗性样品很有希望,而潜在矢量则提供了较少歧视的力量,尽管它们对于对抗性扰动作用如何提供了一些洞察力。