With the increasing adoption of NLP models in real-world products, it becomes more and more important to protect these models from privacy leakage. Because private information in language data is sparse, previous research formalized a Selective-Differential-Privacy (SDP) notion to provide protection for sensitive tokens detected by policy functions, and prove its effectiveness on RNN-based models. But the previous mechanism requires separating the private and public model parameters and thus cannot be applied on large attention-based models. In this paper, we propose a simple yet effective just-fine-tune-twice privacy mechanism to first fine-tune on in-domain redacted data and then on in-domain private data, to achieve SDP for large Transformer-based language models. We also design explicit and contextual policy functions to provide protections at different levels. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack. We further show that even under low-resource settings with a small amount of in-domain data, SDP can still improve the model utility. We will release the code, data and models to facilitate future research.
翻译:随着在现实世界产品中越来越多地采用NLP模型,保护这些模型不受隐私泄漏的影响变得越来越重要。由于语言数据中的私人信息稀少,先前的研究正式确定了一种选择性-差异-隐私(SDP)概念,为政策功能所检测到的敏感符号提供保护,并证明其在基于RNN的模型上的有效性。但是,前一个机制要求将私人和公共模型参数分开,因此无法应用于大型关注型模型。在本文中,我们提议了一个简单而有效的公正-平缓-双向隐私机制,以便首先微调现场重现数据,然后对内部私人数据进行微调,实现大型变异语言模型的SDP。我们还设计了明确和上下文的政策功能,在不同级别提供保护。实验表明,我们的模型在与罐装攻击保持强力的同时取得了强大的性能。我们进一步表明,即使在低资源环境下,只有少量的内源数据,SDP仍然可以改进模型的效用。我们将发布代码、数据和模型,以便利今后的研究。