Large language models are shown to present privacy risks through memorization of training data, and several recent works have studied such risks for the pre-training phase. Little attention, however, has been given to the fine-tuning phase and it is not well understood how different fine-tuning methods (such as fine-tuning the full model, the model head, and adapter) compare in terms of memorization risk. This presents increasing concern as the "pre-train and fine-tune" paradigm proliferates. In this paper, we empirically study memorization of fine-tuning methods using membership inference and extraction attacks, and show that their susceptibility to attacks is very different. We observe that fine-tuning the head of the model has the highest susceptibility to attacks, whereas fine-tuning smaller adapters appears to be less vulnerable to known extraction attacks.
翻译:大型语言模型显示通过对培训数据进行记忆化来显示隐私风险,最近的一些著作研究了培训前阶段的此类风险。然而,很少注意微调阶段,也不清楚不同的微调方法(如微调全模、模版头和适配器)如何在记忆化风险方面进行不同的微调方法(如微调全模、模版头和适配器)的比较。随着“预调和微调”范式的激增,这引起了越来越多的关注。在本文中,我们用经验研究使用会员推论和抽取攻击的微调方法的微调方法的记忆化,并表明这些微调方法对攻击的易感性非常不同。我们观察到,微调模型头部最易受到攻击,而微调小适应器似乎不易受到已知的抽取攻击。