语言模式有区别的私人微调 (Differentially Private Fine-tuning of Language Models)

Da Yu,Saurabh Naik,Arturs Backurs,Sivakanth Gopi,Huseyin A. Inan,Gautam Kamath,Janardhan Kulkarni,Yin Tat Lee,Andre Manoel,Lukas Wutschitz,Sergey Yekhanin,Huishuai Zhang

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $\epsilon = 6.8,\delta=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.

翻译：我们为大规模预先培训的语言模型提供更简便、更稀薄和更快的算法,用于不同程度的私人微调,这些模型在很多标准NLP任务中实现了最先进的隐私和实用权衡。我们建议了这一问题的元框架,这是由最近高参数效率的微调方法的成功所启发的。我们的实验表明,这些方法的不同私人调整在三个重要方面优于以前的私人算法:通用、隐私以及私人培训的计算和记忆成本。在许多通常研究的数据集中,私人模型方法的效用是非私人模型的精确度。例如,在MNLI数据集中,我们利用ROBERTA-Large实现准确87.8美元和83.5美元,使用ROBERTA-Base的私隐私隐预算为6.7美元。相比之下,如果没有隐私限制,RoBERTA-Legerial实现了90.2 美元的准确度。我们发现自然语言生成的任务类似。在非私基调、GPT-2-S-S-40、GPT-2-L5、GPT-M-M-L 分别为G-S-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-M-M-S-S-S-S-S-S-S-S-S-S-S-BS-BS-BS-S-S-S-S-BS-BS-BSL 和S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S