Membership Inference Attacks (MIAs) aim to determine whether a specific data point was included in the training set of a target model. Although there are have been numerous methods developed for detecting data contamination in large language models (LLMs), their performance on multimodal LLMs (MLLMs) falls short due to the instabilities introduced through multimodal component adaptation and possible distribution shifts across multiple inputs. In this work, we investigate multimodal membership inference and address two issues: first, by identifying distribution shifts in the existing datasets, and second, by releasing an extended baseline pipeline to detect them. We also generalize the perturbation-based membership inference methods to MLLMs and release \textbf{FiMMIA} -- a modular \textbf{F}ramework for \textbf{M}ultimodal \textbf{MIA}.\footnote{The source code and framework have been made publicly available under the MIT license via \href{https://github.com/ai-forever/data_leakage_detect}{link}.The video demonstration is available on \href{https://youtu.be/a9L4-H80aSg}{YouTube}.} Our approach trains a neural network to analyze the target model's behavior on perturbed inputs, capturing distributional differences between members and non-members. Comprehensive evaluations on various fine-tuned multimodal models demonstrate the effectiveness of our perturbation-based membership inference attacks in multimodal domains.
翻译:成员推理攻击旨在判断特定数据点是否包含在目标模型的训练集中。尽管已有多种方法用于检测大型语言模型中的数据污染,但这些方法在多模态大型语言模型上的表现不佳,原因在于多模态组件适配引入的不稳定性以及多输入间可能存在的分布偏移。本研究探讨多模态成员推理问题,并解决两个关键问题:首先,通过识别现有数据集中的分布偏移;其次,发布扩展的基线流程以检测此类偏移。我们还将基于扰动的成员推理方法推广至多模态大型语言模型,并发布\\textbf{FiMMIA}——一个模块化的\\textbf{多模态成员推理攻击框架}。该方法训练神经网络分析目标模型在扰动输入上的行为,捕捉成员与非成员之间的分布差异。通过对多种微调多模态模型的综合评估,验证了基于扰动的成员推理攻击在多模态领域的有效性。