Recent large-scale natural language processing (NLP) systems use a pre-trained Large Language Model (LLM) on massive and diverse corpora as a headstart. In practice, the pre-trained model is adapted to a wide array of tasks via fine-tuning on task-specific datasets. LLMs, while effective, have been shown to memorize instances of training data thereby potentially revealing private information processed during pre-training. The potential leakage might further propagate to the downstream tasks for which LLMs are fine-tuned. On the other hand, privacy-preserving algorithms usually involve retraining from scratch, which is prohibitively expensive for LLMs. In this work, we propose a simple, easy to interpret, and computationally lightweight perturbation mechanism to be applied to an already trained model at the decoding stage. Our perturbation mechanism is model-agnostic and can be used in conjunction with any LLM. We provide theoretical analysis showing that the proposed mechanism is differentially private, and experimental results showing a privacy-utility trade-off.
翻译:最近大规模自然语言处理系统(NLP)使用经过预先培训的大型语言模型(LLM),作为大规模和多样化公司的一个头顶。在实践中,通过对任务数据集进行微调,对预先培训的模式进行调整,以适应范围广泛的任务。LLMS虽然有效,但被显示为对培训数据进行记忆,从而有可能披露培训前处理的私人信息。潜在的渗漏可能会进一步扩散到LLMS正在微调的下游任务。另一方面,保护隐私的算法通常涉及从零到零的再培训,这对LLMS来说代价太高了。在这项工作中,我们提出了一个简单、容易解释和计算较轻的扰动机制,在解码阶段应用于已经受过训练的模型。我们的扰动机制是模型,可以与任何LMM一起使用。我们提供理论分析,表明拟议的机制是不同的私人机制,实验结果显示一种私利交易。