Large language models are shown to memorize privacy information such as social security numbers in training data. Given the sheer scale of the training corpus, it is challenging to screen and filter these privacy data, either manually or automatically. In this paper, we propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We borrow ideas from differential privacy (which solves a related but distinct problem) and show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Moreover, we show that redaction with an approximately correct screening policy amplifies the confidentiality guarantee. We implement the method for both LSTM and GPT language models. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality.
翻译:大型语言模型可以记住隐私信息,例如培训数据中的社会保障数字。鉴于培训资料的范围之大,对人工或自动筛选和过滤这些隐私数据具有挑战性。在本文中,我们提议采用保密的再培训方法来培训语言生成模型,同时保护机密部分。我们从差异隐私(这解决了一个相关但独特的问题)中借出一些想法,并表明我们的方法能够通过随机调整部分培训过程来防止意外的记忆化。此外,我们显示,经过修改的大致正确的筛选政策会扩大保密性保障。我们实施了LSTM和GPT语言模型的方法。我们的实验结果表明,CRT所培训的模式在保持强有力的保密性的同时,也得到了几乎相同的重复性。