Recent developments in deep learning have led to great success in various natural language processing (NLP) tasks. However, these applications may involve data that contain sensitive information. Therefore, how to achieve good performance while also protect privacy of sensitive data is a crucial challenge in NLP. To preserve privacy, Differential Privacy (DP), which can prevent reconstruction attacks and protect against potential side knowledge, is becoming a de facto technique for private data analysis. In recent years, NLP in DP models (DP-NLP) has been studied from different perspectives, which deserves a comprehensive review. In this paper, we provide the first systematic review of recent advances on DP deep learning models in NLP. In particular, we first discuss some differences and additional challenges of DP-NLP compared with the standard DP deep learning. Then we investigate some existing work on DP-NLP and present its recent developments from two aspects: gradient perturbation based methods and embedding vector perturbation based methods. We also discuss some challenges and future directions of this topic.
翻译:最近深层次学习的发展导致各种自然语言处理(NLP)任务取得巨大成功,然而,这些应用可能涉及包含敏感信息的数据,因此,如何在保护敏感数据的隐私的同时取得良好业绩,是NLP的一项关键挑战。 保护隐私、差异隐私(DP),可以防止重建攻击和防范潜在的侧面知识,正在成为一种事实上的私人数据分析技术。近年来,DP模型(DP-NLP)中的NLP从不同角度进行了研究,值得全面审查。在本文件中,我们首次系统地审查了DP在NLP深层学习模型方面的最新进展。特别是,我们首先讨论了DP-NLP与标准的DP深层学习之间的一些差异和额外挑战。然后,我们调查了DP-NLP的一些现有工作,并从两个方面介绍其最新发展情况:以梯度为基的方法和嵌入矢量为基的方法。我们还讨论了这一专题的一些挑战和未来方向。