In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs). To preserve UeDP, we developed a novel algorithm, called UeDP-Alg, optimizing the trade-off between privacy loss and model utility with a tight sensitivity bound derived from seamlessly combining user and sensitive entity sampling processes. An extensive theoretical analysis and evaluation show that our UeDP-Alg outperforms baseline approaches in model utility under the same privacy budget consumption on several NLM tasks, using benchmark datasets.
翻译:在本文中,我们引入了用户-实体差异隐私的新概念(UeDP),以便在文字数据中同时向敏感实体提供正式的隐私保护,在学习自然语言模型时向数据拥有者提供正式的隐私保护。 为了保护UeDP,我们开发了名为UeDP-Alg的新型算法,优化了隐私损失与模型效用之间的权衡,从用户和敏感实体抽样过程无缝结合中形成了高度灵敏的结合。 广泛的理论分析和评估表明,我们的UeDP-Alg利用基准数据集,在相同的隐私预算消耗下,在模型效用方面,在几项NLM任务上,利用基准数据集,在相同的隐私预算消耗下,优于基准效用。