Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply applying the unlikelihood training objective to target token sequences is effective at forgetting them with little to no degradation of general language modeling performances; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being orders of magnitude more computationally efficient. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning .
翻译:预先培训的语言模型(LMS)在初始培训前的准备阶段会记住大量知识,包括可能侵犯个人生活和身份隐私的信息,包括可能侵犯个人生活和身份隐私的信息。语言模型的隐私问题先前的工作主要侧重于数据处理前处理和不同隐私方法,两者都需要对基本的LMM进行再培训。我们建议,知识不学习是减少LMS临时工隐私风险的替代方法。我们表明,仅仅将不易懂的培训目标应用于目标象征性序列,就能够有效地忘记它们,而一般语言模型性能很少或没有退化;有时甚至会大大改进基本的LMM,只是一些迭代。我们发现,顺序上的不学习比一次解析所有数据好,而且不学习高度依赖哪类数据(Domain)被遗忘。我们通过比较以往已知的减少LMS的隐私风险的数据处理前处理方法,表明不学可以提供更强有力的实际隐私保障,在那些情况下,容易被提取攻击的数据事先为人所知,同时具有更高的计算效率。我们发布代码和数据系统需要复制我们在 http://angs/givcom上复制我们的成果。