Recent data-extraction attacks have exposed that language models can memorize some training samples verbatim. This is a vulnerability that can compromise the privacy of the model's training data. In this work, we introduce SubMix: a practical protocol for private next-token prediction designed to prevent privacy violations by language models that were fine-tuned on a private corpus after pre-training on a public corpus. We show that SubMix limits the leakage of information that is unique to any individual user in the private corpus via a relaxation of group differentially private prediction. Importantly, SubMix admits a tight, data-dependent privacy accounting mechanism, which allows it to thwart existing data-extraction attacks while maintaining the utility of the language model. SubMix is the first protocol that maintains privacy even when publicly releasing tens of thousands of next-token predictions made by large transformer-based models such as GPT-2.
翻译:最近的数据提取攻击暴露了语言模型可以逐字记住一些培训样本。 这是一种脆弱性,它可能损害模型培训数据的隐私。 在这项工作中,我们引入了子Mix:一个实用协议,用于私下的下对口预测,目的是防止语言模型侵犯隐私,这些语言模型在对公共实体进行预先培训后对私人实体进行了微调。我们显示,子Mix通过放松集体不同私人预测,限制私人实体中任何个人用户独有的信息泄漏。 重要的是, 子Mix承认了一个紧凑的、依赖数据的隐私核算机制,允许它挫败现有的数据提取攻击,同时保持语言模型的实用性。 子Mix是第一个协议,即使在公开发布诸如GPT-2等大型变压模型的成千上万次口预测时,也保持隐私。