SPARTA：通过文本自编码器潜在空间中的黑盒对抗性释义评估推理分割的鲁棒性 (SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space)

Multimodal large language models (MLLMs) have shown impressive capabilities in vision-language tasks such as reasoning segmentation, where models generate segmentation masks based on textual queries. While prior work has primarily focused on perturbing image inputs, semantically equivalent textual paraphrases-crucial in real-world applications where users express the same intent in varied ways-remain underexplored. To address this gap, we introduce a novel adversarial paraphrasing task: generating grammatically correct paraphrases that preserve the original query meaning while degrading segmentation performance. To evaluate the quality of adversarial paraphrases, we develop a comprehensive automatic evaluation protocol validated with human studies. Furthermore, we introduce SPARTA-a black-box, sentence-level optimization method that operates in the low-dimensional semantic latent space of a text autoencoder, guided by reinforcement learning. SPARTA achieves significantly higher success rates, outperforming prior methods by up to 2x on both the ReasonSeg and LLMSeg-40k datasets. We use SPARTA and competitive baselines to assess the robustness of advanced reasoning segmentation models. We reveal that they remain vulnerable to adversarial paraphrasing-even under strict semantic and grammatical constraints. All code and data will be released publicly upon acceptance.

翻译：多模态大语言模型（MLLMs）在视觉语言任务中展现出卓越的能力，例如推理分割，即模型根据文本查询生成分割掩码。先前的研究主要集中于扰动图像输入，而语义等价的文本释义——在实际应用中用户以不同方式表达相同意图的关键因素——仍未得到充分探索。为填补这一空白，我们引入了一种新颖的对抗性释义任务：生成语法正确、保留原始查询含义但降低分割性能的释义。为评估对抗性释义的质量，我们开发了一套全面的自动评估协议，并通过人工研究验证。此外，我们提出了SPARTA——一种黑盒、句子级的优化方法，在文本自编码器的低维语义潜在空间中运行，以强化学习为指导。SPARTA在ReasonSeg和LLMSeg-40k数据集上取得了显著更高的成功率，优于先前方法达2倍。我们利用SPARTA和竞争性基线评估了先进推理分割模型的鲁棒性。我们发现，即使在严格的语义和语法约束下，这些模型仍易受对抗性释义攻击。所有代码和数据将在论文被接受后公开。