Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply performing gradient ascent on target token sequences is effective at forgetting them with little to no degradation of general language modeling performances for larger LMs; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method and a decoding method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being much more efficient and robust. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning.
翻译:培训前语言模型(LMS)在初始培训前的准备阶段会记住大量知识,包括可能侵犯个人生活和身份隐私隐私的信息。语言模型的隐私问题以往的工作主要侧重于数据处理前处理和不同隐私方法,两者都需要对基本的LMM进行再培训。我们提出知识不学习,作为减少LMS临时工隐私风险的替代方法。我们表明,仅仅在目标指示序列上显示梯度,就能够有效地忘记它们,大LMS通用语言模型性能几乎不会退化;有时甚至大大改进LM基础性能,只是几个迭代。我们还发现,顺序上的不学习比试图一次解析所有数据好,而未学习则高度依赖忘记哪类数据(Domain)。我们通过显示与先前的数据预处理方法的比较以及已知的减轻LMS隐私风险的解码方法,表明,不学习可以提供更强有力的实际隐私保障,在那些容易被提取攻击的数据事先为人所知但效率更高、更稳健的情景中,我们发布代码和数据需要复制我们的数据。