Membership inference attacks (MIAs) aim to determine whether a specific sample was used to train a predictive model. Knowing this may indeed lead to a privacy breach. Most MIAs, however, make use of the model's prediction scores - the probability of each output given some input - following the intuition that the trained model tends to behave differently on its training data. We argue that this is a fallacy for many modern deep network architectures. Consequently, MIAs will miserably fail since overconfidence leads to high false-positive rates not only on known domains but also on out-of-distribution data and implicitly acts as a defense against MIAs. Specifically, using generative adversarial networks, we are able to produce a potentially infinite number of samples falsely classified as part of the training data. In other words, the threat of MIAs is overestimated, and less information is leaked than previously assumed. Moreover, there is actually a trade-off between the overconfidence of models and their susceptibility to MIAs: the more classifiers know when they do not know, making low confidence predictions, the more they reveal the training data.
翻译:成员推论攻击(MIAs)旨在确定是否使用了特定样本来训练预测模型。知道这确实可能导致隐私侵犯。但是,大多数MIAs都利用模型的预测分数——根据经过训练的模型对培训数据有不同表现的直觉,每个产出的概率是某些投入。我们争辩说,这对许多现代深层网络结构来说是一个谬误。因此,MIAs将不可避免地失败,因为过度信任不仅导致已知域的错误率高,而且会导致传播数据外的错误率高,并隐含地成为对MIAs的防御。具体地说,使用基因对抗网络,我们能够产生可能无限数量的样本,被错误地归类为培训数据的一部分。换句话说,MIAs的威胁被高估了,信息被泄露的少于先前的假设。此外,在模型的过度自信和对MIA的易感性之间实际上存在着一种权衡:更多的分类者知道,作出低信任预测,他们披露的培训数据越多。