With the increase in deep learning, it becomes increasingly difficult to understand the model in which AI systems can identify objects. Thus, an adversary could aim to modify an image by adding unseen elements, which will confuse the AI in its recognition of an entity. This paper thus investigates the adversarial robustness of LLaVA-1.5-13B and Meta's Llama 3.2 Vision-8B-2. These are tested for untargeted PGD (Projected Gradient Descent) against the visual input modality, and empirically evaluated on the Visual Question Answering (VQA) v2 dataset subset. The results of these adversarial attacks are then quantified using the standard VQA accuracy metric. This evaluation is then compared with the accuracy degradation (accuracy drop) of LLaVA and Llama 3.2 Vision. A key finding is that Llama 3.2 Vision, despite a lower baseline accuracy in this setup, exhibited a smaller drop in performance under attack compared to LLaVA, particularly at higher perturbation levels. Overall, the findings confirm that the vision modality represents a viable attack vector for degrading the performance of contemporary open-weight VLMs, including Meta's Llama 3.2 Vision. Furthermore, they highlight that adversarial robustness does not necessarily correlate directly with standard benchmark performance and may be influenced by underlying architectural and training factors.
翻译:随着深度学习的发展,理解人工智能系统识别物体的模型变得越来越困难。因此,攻击者可能试图通过添加不可见元素来修改图像,从而干扰AI对实体的识别。本文研究了LLaVA-1.5-13B和Meta的Llama 3.2 Vision-8B-2的对抗鲁棒性。针对视觉输入模态进行了无目标投影梯度下降(PGD)攻击测试,并在Visual Question Answering (VQA) v2数据集的子集上进行了实证评估。随后使用标准VQA准确率指标对这些对抗攻击的结果进行量化,并将该评估结果与LLaVA和Llama 3.2 Vision的准确率下降程度进行比较。关键发现表明:Llama 3.2 Vision尽管在此设置中基线准确率较低,但在遭受攻击时(特别是在较高扰动水平下)表现出的性能下降幅度小于LLaVA。总体而言,研究结果证实视觉模态是降低当代开放权重视觉语言模型(包括Meta的Llama 3.2 Vision)性能的有效攻击途径。此外,这些发现强调对抗鲁棒性未必与标准基准性能直接相关,可能受到底层架构和训练因素的影响。