As deep learning is now used in many real-world applications, research has focused increasingly on the privacy of deep learning models and how to prevent attackers from obtaining sensitive information about the training data. However, image-text models like CLIP have not yet been looked at in the context of privacy attacks. While membership inference attacks aim to tell whether a specific data point was used for training, we introduce a new type of privacy attack, named identity inference attack (IDIA), designed for multi-modal image-text models like CLIP. Using IDIAs, an attacker can reveal whether a particular person, was part of the training data by querying the model in a black-box fashion with different images of the same person. Letting the model choose from a wide variety of possible text labels, the attacker can probe the model whether it recognizes the person and, therefore, was used for training. Through several experiments on CLIP, we show that the attacker can identify individuals used for training with very high accuracy and that the model learns to connect the names with the depicted people. Our experiments show that a multi-modal image-text model indeed leaks sensitive information about its training data and, therefore, should be handled with care.
翻译:由于现在许多现实应用中都使用了深层次的学习,研究越来越侧重于深层次学习模型的隐私和如何防止攻击者获得有关培训数据的敏感信息,然而,在隐私攻击的背景下,尚未对CLIP等图像文本模型进行审视;虽然成员推论攻击的目的是判断是否使用特定数据点进行培训,但我们引入了一种新的隐私攻击类型,称为身份推断攻击(DIA),用于像CLIP这样的多模式图像文本模型。攻击者使用IDIA(IDIA),通过以黑盒方式用同一人的不同图像对模型进行查询,可以揭示某个特定人员是否是培训数据的一部分。让模型从各种可能的文本标签中选择,攻击者可以调查该模型是否承认了该人,因此用于培训。通过对CLIP(DIA)的几次实验,我们证明攻击者可以辨别用于培训的人非常精确,并且模型学会将姓名与被描绘的人联系起来。我们的实验显示,多模式的图像-文字模型应该与敏感的数据进行处理。