Privacy preservation remains a key challenge in data mining and Natural Language Understanding (NLU). Previous research shows that the input text or even text embeddings can leak private information. This concern motivates our research on effective privacy preservation approaches for pretrained Language Models (LMs). We investigate the privacy and utility implications of applying dx-privacy, a variant of Local Differential Privacy, to BERT fine-tuning in NLU applications. More importantly, we further propose privacy-adaptive LM pretraining methods and show that our approach can boost the utility of BERT dramatically while retaining the same level of privacy protection. We also quantify the level of privacy preservation and provide guidance on privacy configuration. Our experiments and findings lay the groundwork for future explorations of privacy-preserving NLU with pretrained LMs.
翻译:保护隐私仍然是数据挖掘和自然语言理解(NLU)方面的一个关键挑战。以前的研究表明,输入文本甚至文字嵌入可能泄露私人信息。这一关切促使我们研究对预先培训的语言模式(LMs)采取有效的隐私保护方法。我们调查了应用dx-privaicy(地方差异隐私的变体)对NLU应用中BERT进行微调的隐私和效用影响。更重要的是,我们进一步提出了隐私适应性LM预培训方法,并表明我们的方法可以极大地提高BERT的效用,同时保留同样的隐私保护水平。我们还量化了隐私保护水平,并就隐私配置提供了指导。我们的实验和发现为今后探索使用预先培训的LMs来保护隐私的NLU奠定了基础。