Transfer learning is a popular method for tuning pretrained (upstream) models for different downstream tasks using limited data and computational resources. We study how an adversary with control over an upstream model used in transfer learning can conduct property inference attacks on a victim's tuned downstream model. For example, to infer the presence of images of a specific individual in the downstream training set. We demonstrate attacks in which an adversary can manipulate the upstream model to conduct highly effective and specific property inference attacks (AUC score $> 0.9$), without incurring significant performance loss on the main task. The main idea of the manipulation is to make the upstream model generate activations (intermediate features) with different distributions for samples with and without a target property, thus enabling the adversary to distinguish easily between downstream models trained with and without training examples that have the target property. Our code is available at https://github.com/yulongt23/Transfer-Inference.
翻译:转移学习是一种流行的方法,用于使用有限的数据和计算资源调整预先训练的(上游)模型以进行不同的下游任务。我们研究了一种情况,即拥有对转移学习中的上游模型的控制权的对手,如何对受害者调整的下游模型进行属性推断攻击。例如,推断下游训练集中是否存在特定个人的图像。我们展示了攻击,其中对手可以操纵上游模型以进行高效且具有特异性的属性推断攻击(AUC分数$>0.9$),而不会使主要任务性能严重下降。操纵的主要思想是使上游模型为具有和不具有目标属性的样本生成具有不同分布的激活(中间特征),从而使对手可以轻松区分使用和没有使用具有目标属性的训练示例进行训练的下游模型。我们的代码可在https://github.com/yulongt23/Transfer-Inference找到。