Understanding the memorization and privacy leakage risks in Contrastive Language--Image Pretraining (CLIP) is critical for ensuring the security of multimodal models. Recent studies have demonstrated the feasibility of extracting sensitive training examples from diffusion models, with conditional diffusion models exhibiting a stronger tendency to memorize and leak information. In this work, we investigate data memorization and extraction risks in CLIP through the lens of CLIP inversion, a process that aims to reconstruct training images from text prompts. To this end, we introduce \textbf{LeakyCLIP}, a novel attack framework designed to achieve high-quality, semantically accurate image reconstruction from CLIP embeddings. We identify three key challenges in CLIP inversion: 1) non-robust features, 2) limited visual semantics in text embeddings, and 3) low reconstruction fidelity. To address these challenges, LeakyCLIP employs 1) adversarial fine-tuning to enhance optimization smoothness, 2) linear transformation-based embedding alignment, and 3) Stable Diffusion-based refinement to improve fidelity. Empirical results demonstrate the superiority of LeakyCLIP, achieving over 258% improvement in Structural Similarity Index Measure (SSIM) for ViT-B-16 compared to baseline methods on LAION-2B subset. Furthermore, we uncover a pervasive leakage risk, showing that training data membership can even be successfully inferred from the metrics of low-fidelity reconstructions. Our work introduces a practical method for CLIP inversion while offering novel insights into the nature and scope of privacy risks in multimodal models.
翻译:理解对比语言-图像预训练(CLIP)中的记忆化与隐私泄露风险对于保障多模态模型的安全性至关重要。近期研究表明,从扩散模型中提取敏感训练样本具有可行性,其中条件扩散模型表现出更强的记忆化与信息泄露倾向。本研究通过CLIP反演视角,探讨CLIP中的数据记忆化与提取风险,该过程旨在从文本提示中重建训练图像。为此,我们提出**LeakyCLIP**——一种新颖的攻击框架,旨在从CLIP嵌入中实现高质量、语义准确的图像重建。我们识别出CLIP反演中的三个关键挑战:1)非鲁棒特征,2)文本嵌入中有限的视觉语义,以及3)低重建保真度。为应对这些挑战,LeakyCLIP采用:1)对抗性微调以增强优化平滑性,2)基于线性变换的嵌入对齐,以及3)基于Stable Diffusion的细化以提升保真度。实证结果证明了LeakyCLIP的优越性,在LAION-2B子集上,ViT-B-16的结构相似性指数(SSIM)相比基线方法提升超过258%。此外,我们揭示了普遍存在的泄露风险,表明即使从低保真度重建的度量指标中也能成功推断训练数据成员身份。本研究不仅提出了CLIP反演的实用方法,同时为多模态模型中隐私风险的本质与范围提供了新的见解。