When trying to apply the recent advance of Natural Language Understanding (NLU) technologies to real-world applications, privacy preservation imposes a crucial challenge, which, unfortunately, has not been well resolved. To address this issue, we study how to improve the effectiveness of NLU models under a Local Privacy setting, using BERT, a widely-used pretrained Language Model (LM), as an example. We systematically study the strengths and weaknesses of imposing dx-privacy, a relaxed variant of Local Differential Privacy, at different stages of language modeling: input text, token embeddings, and sequence representations. We then focus on the former two with privacy-constrained fine-tuning experiments to reveal the utility of BERT under local privacy constraints. More importantly, to the best of our knowledge, we are the first to propose privacy-adaptive LM pretraining methods and demonstrate that they can significantly improve model performance on privatized text input. We also interpret the level of privacy preservation and provide our guidance on privacy parameter selections.
翻译:在试图将近期的自然语言理解(NLU)技术进步应用于现实世界应用时,保护隐私是一项关键的挑战,但不幸的是,这个问题没有得到妥善解决。为解决这一问题,我们研究如何利用BERT这一广泛使用的预先培训的语言模式(LM),在本地隐私环境中提高当地语言理解(NLU)模式的有效性。我们系统地研究在语言建模的不同阶段,即输入文本、象征性嵌入和顺序表达阶段,强制实行dx-privity(dx-privity)的宽松地方差异隐私变体的优点和弱点。我们然后侧重于前两个阶段,进行限制隐私的微调实验,以揭示本地隐私限制下BERT的效用。更重要的是,我们最了解的是,我们首先提出适合隐私的LM预培训方法,并表明它们能够大大改进私有化文本投入的示范性表现。我们还解释隐私保护水平,并就隐私参数的选择提供指导。