预测性很强的大型语言模型的差分隐私私下微调 (Privately Fine-Tuning Large Language Models with Differential Privacy)

Pre-trained Large Language Models (LLMs) are an integral part of modern AI that have led to breakthrough performances in complex AI tasks. Major AI companies with expensive infrastructures are able to develop and train these large models with billions and millions of parameters from scratch. Third parties, researchers, and practitioners are increasingly adopting these pre-trained models and fine-tuning them on their private data to accomplish their downstream AI tasks. However, it has been shown that an adversary can extract/reconstruct the exact training samples from these LLMs, which can lead to revealing personally identifiable information. The issue has raised deep concerns about the privacy of LLMs. Differential privacy (DP) provides a rigorous framework that allows adding noise in the process of training or fine-tuning LLMs such that extracting the training data becomes infeasible (i.e., with a cryptographically small success probability). While the theoretical privacy guarantees offered in most extant studies assume learning models from scratch through many training iterations in an asymptotic setting, this assumption does not hold in fine-tuning scenarios in which the number of training iterations is significantly smaller. To address the gap, we present \ewtune, a DP framework for fine-tuning LLMs based on Edgeworth accountant with finite-sample privacy guarantees. Our results across four well-established natural language understanding (NLU) tasks show that while \ewtune~adds privacy guarantees to LLM fine-tuning process, it directly contributes to decreasing the induced noise to up to 5.6\% and improves the state-of-the-art LLMs performance by up to 1.1\% across all NLU tasks. We have open-sourced our implementations for wide adoption and public testing purposes.

翻译：预训练的大型语言模型是现代人工智能中不可或缺的一部分，已经在复杂的人工智能任务中实现了重大突破。拥有昂贵基础设施的主要人工智能公司可以使用数十亿和数百万个参数从头开始开发和训练这些大型模型。第三方、研究人员和实践者越来越多地采用这些预训练模型，并将它们应用于其私人数据，以完成下游人工智能任务。然而，已经证明敌对方可以从这些LLM中提取/重构出确切的训练样本，这可能会导致透露个人身份信息。这个问题引发了人们对LLM隐私的深刻关注。差分隐私（DP）提供了一个严密的框架，允许在训练或微调LLM的过程中添加噪声，从而使提取训练数据变得不可行（即具有密码学上很小的成功概率）。尽管大多数现有研究中提供的理论隐私保证假设在渐近设置中通过多次训练迭代从头开始学习模型，但这个假设在微调场景中不成立，其中训练迭代次数明显较少。为了解决这个问题，我们提出了一种基于Edge accountant的LLM微调DP框架，具有有限样本的隐私保证。我们在四个已有的自然语言理解任务中的结果表明，尽管ewtune为LLM微调过程增加了隐私保证，但它直接导致了降低噪声到高达5.6％，并且在所有自然语言理解任务中提高了最先进的LLM性能高达1.1％。我们开放了我们的实现以供广泛采用和公共测试目的。