When an adversary provides poison samples to a machine learning model, privacy leakage, such as membership inference attacks that infer whether a sample was included in the training of the model, becomes effective by moving the sample to an outlier. However, the attacks can be detected because inference accuracy deteriorates due to poison samples. In this paper, we discuss a \textit{backdoor-assisted membership inference attack}, a novel membership inference attack based on backdoors that return the adversary's expected output for a triggered sample. We found three crucial insights through experiments with an academic benchmark dataset. We first demonstrate that the backdoor-assisted membership inference attack is unsuccessful. Second, when we analyzed loss distributions to understand the reason for the unsuccessful results, we found that backdoors cannot separate loss distributions of training and non-training samples. In other words, backdoors cannot affect the distribution of clean samples. Third, we also show that poison and triggered samples activate neurons of different distributions. Specifically, backdoors make any clean sample an inlier, contrary to poisoning samples. As a result, we confirm that backdoors cannot assist membership inference.
翻译:当攻击者向机器学习模型提供毒样本时,会产生隐私泄露,例如成员推理攻击,这种攻击可以推断出样本是否包含在模型的训练中。然而,由于毒样本的存在,攻击是可以被检测到的,因为推理准确度会不断下降。在本文中,我们讨论了一种新型的基于后门的成员推理攻击,这种攻击利用后门返回攻击者预期的输出,针对触发样本进行推理。通过使用一个学术基准数据集的实验,我们得出了三个关键见解。我们首先证明了后门辅助的成员推理攻击是不成功的。其次,在分析损失分布以理解不成功结果的原因时,我们发现后门不能将训练和非训练样本的损失分布分离。换句话说,后门不能影响干净样本的分布。第三,我们还显示了毒样本和触发样本激活不同分布的神经元。具体而言,后门将任何干净样本变成内部样本,与毒样本相反。因此,我们确认后门不能协助成员推理攻击。