Machine learning models are vulnerable to membership inference attacks in which an adversary aims to predict whether or not a particular sample was contained in the target model's training dataset. Existing attack methods have commonly exploited the output information (mostly, losses) solely from the given target model. As a result, in practical scenarios where both the member and non-member samples yield similarly small losses, these methods are naturally unable to differentiate between them. To address this limitation, in this paper, we propose a new attack method, called \system, which can exploit the membership information from the whole training process of the target model for improving the attack performance. To mount the attack in the common black-box setting, we leverage knowledge distillation, and represent the membership information by the losses evaluated on a sequence of intermediate models at different distillation epochs, namely \emph{distilled loss trajectory}, together with the loss from the given target model. Experimental results over different datasets and model architectures demonstrate the great advantage of our attack in terms of different metrics. For example, on CINIC-10, our attack achieves at least 6$\times$ higher true-positive rate at a low false-positive rate of 0.1\% than existing methods. Further analysis demonstrates the general effectiveness of our attack in more strict scenarios.
翻译:现有攻击方法通常只利用特定目标模型的输出信息(主要是损失),结果,在成员和非成员样本产生类似小损失的实际假设中,这些方法自然无法区分它们。为了解决这一局限性,我们在本文件中提出了一个新的攻击方法,称为“系统”,它可以利用目标模型整个培训过程的会员信息来改进攻击性能。在普通黑盒设置中发动攻击,我们利用知识蒸馏,并以不同蒸馏池的中间模型序列(即\emph{stilled损失轨迹}评估损失来代表成员信息。为了解决这一局限性,我们提议了一个新的攻击方法,即所谓的“系统”,它可以利用目标模型改进攻击性能的整个培训过程的会员信息。例如,我们的攻击在普通黑盒中进行,我们利用知识蒸馏,并以不同蒸馏器的中间模型序列评估的损失来代表成员信息。在不同的蒸馏池中,即\emph{stilled损失轨迹}中,这些方法自然无法区分。为了解决这一局限性,我们不同数据集和模型的实验结果显示了我们攻击在不同的指标方面的巨大优势。例如,在CINI-10上,我们的攻击以至少达到6美元的更精确的精确的精确率,在更精确的精确的精确的精确率中,在更精确的假设中,以0.1比我们目前的精确的精确的精确的精确率中,以更精确的比较。