Pre-trained Large Language Models (LLMs) are an integral part of modern AI that have led to breakthrough performances in complex AI tasks. Major AI companies with expensive infrastructures are able to develop and train these large models with billions and millions of parameters from scratch. Third parties, researchers, and practitioners are increasingly adopting these pre-trained models and fine-tuning them on their private data to accomplish their downstream AI tasks. However, it has been shown that an adversary can extract/reconstruct the exact training samples from these LLMs, which can lead to revealing personally identifiable information. The issue has raised deep concerns about the privacy of LLMs. Differential privacy (DP) provides a rigorous framework that allows adding noise in the process of training or fine-tuning LLMs such that extracting the training data becomes infeasible (i.e., with a cryptographically small success probability). While the theoretical privacy guarantees offered in most extant studies assume learning models from scratch through many training iterations in an asymptotic setting, this assumption does not hold in fine-tuning scenarios in which the number of training iterations is significantly smaller. To address the gap, we present \ewtune, a DP framework for fine-tuning LLMs based on Edgeworth accountant with finite-sample privacy guarantees. Our results across four well-established natural language understanding (NLU) tasks show that while \ewtune~adds privacy guarantees to LLM fine-tuning process, it directly contributes to decreasing the induced noise to up to 5.6\% and improves the state-of-the-art LLMs performance by up to 1.1\% across all NLU tasks. We have open-sourced our implementations for wide adoption and public testing purposes.
翻译:预训练的大型语言模型(LLMs)是现代人工智能的重要组成部分,取得了在复杂AI任务中的突破性性能。拥有昂贵基础设施的主要人工智能公司能够从头开始构建和训练这些具有数十亿和数百万个参数的大型模型。第三方、研究者和实践者可以采用这些预训练的模型,并在其私有数据上对其进行微调以完成其下游AI任务。然而,已经证明,攻击者可以从这些LLMs中提取/重构出完全的训练样本,这可能导致透露个人身份信息的风险。这个问题引起了对LLMs隐私性的深刻关注。差分隐私(DP)提供了一个严格的框架,允许在训练或精调LLMs的过程中添加噪声,使得提取训练数据变得不可行(即以密码术小的成功概率)。虽然大多数现有研究提供的理论隐私保证假定在一个渐近的设置中从头学习模型经历多次训练迭代,但这种假设在许多细调场景中是不成立的,因为训练迭代的次数显著减少。为了填补这个空白,我们提出了一个基于EdgeWorth会计师和有限样本隐私保证的DP细调LLMs框架——\texttt{EW-Tune}。我们在四个广泛接受的自然语言理解(NLU)任务上表明,\texttt{EW-Tune}不仅为LLMs细调过程添加了隐私保证,还直接降低了产生的噪声,最高可达5.6%,在所有NLU任务中提高了最先进的LLMs表现高达1.1%。我们开源了我们的实现,以进行广泛的采用和公共测试。