Given a set of unlabeled images or (image, text) pairs, contrastive learning aims to pre-train an image encoder that can be used as a feature extractor for many downstream tasks. In this work, we propose EncoderMI, the first membership inference method against image encoders pre-trained by contrastive learning. In particular, given an input and a black-box access to an image encoder, EncoderMI aims to infer whether the input is in the training dataset of the image encoder. EncoderMI can be used 1) by a data owner to audit whether its (public) data was used to pre-train an image encoder without its authorization or 2) by an attacker to compromise privacy of the training data when it is private/sensitive. Our EncoderMI exploits the overfitting of the image encoder towards its training data. In particular, an overfitted image encoder is more likely to output more (or less) similar feature vectors for two augmented versions of an input in (or not in) its training dataset. We evaluate EncoderMI on image encoders pre-trained on multiple datasets by ourselves as well as the Contrastive Language-Image Pre-training (CLIP) image encoder, which is pre-trained on 400 million (image, text) pairs collected from the Internet and released by OpenAI. Our results show that EncoderMI can achieve high accuracy, precision, and recall. We also explore a countermeasure against EncoderMI via preventing overfitting through early stopping. Our results show that it achieves trade-offs between accuracy of EncoderMI and utility of the image encoder, i.e., it can reduce the accuracy of EncoderMI, but it also incurs classification accuracy loss of the downstream classifiers built based on the image encoder.
翻译:根据一组未贴标签的图像或(图像、文本)配对,对比式学习的目的是预设一个图像编码器,可以用作许多下游任务的特征提取器。在这项工作中,我们提议EncoderMI,这是攻击者对图像编码器的第一个成员推导方法,事先经过对比性学习培训。特别是,如果输入和黑箱访问图像编码器,EncoderMI的目的是推断输入的内容是否在400个图像编码器的培训数据集中。EcoderMI可以被数据所有人用于审计其(公共)数据是否在没有授权或2的情况下用于预导图像编码器。我们提议Encoder MI,当培训数据是私人/敏感时,攻击者对图像编码编码器的隐私进行损害。特别是,如果输入一个输入和黑箱访问一个图像编码器过前,图像编码器更可能输出更多(或更少),但对于两个更新版本的(不是在)其(公共)数据编码数据编码数据编码器的精确度进行审计。我们通过图像编码器在图像编码前的存储中将结果进行升级。我们通过多部的图像编码显示,CLIMI,通过数据系统显示一个数据记录显示它的高级数据记录,从而显示它的数据记录显示它的高级数据。