Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model. It has been recently shown that simple heuristics can reconstruct data samples from language models, making this threat scenario an important aspect of model release. Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget (epsilon > 8) which does not translate to meaningful guarantees. In this paper we show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature. In particular, we show that larger privacy budgets do not protect against membership inference, but can still protect extraction of rare secrets. We show experimentally that our guarantees hold against various language models, including GPT-2 finetuned on Wikitext-103.
翻译:重建攻击使得对手能够利用经过训练的模型来重新生成训练数据集的数据样本。 最近已经表明,简单的超自然学可以从语言模型中重建数据样本,使这种威胁情景成为示范释放的一个重要方面。 不同的隐私是已知的这种攻击的解决方案,但往往使用相对庞大的隐私预算(epsilon > 8),但不能转化为有意义的保障。 在本文中,我们表明,对于同样的机制,我们可以为重建攻击获得比传统文献中的系统更好的隐私保障。 特别是,我们表明,更大的隐私预算不能保护会员身份的推断,但是仍然可以保护稀有秘密的提取。 我们实验性地表明,我们的保障与各种语言模式,包括GPT-2对Wikipext-103的微调。