As the tide of Big Data continues to influence the landscape of Natural Language Processing (NLP), the utilization of modern NLP methods has grounded itself in this data, in order to tackle a variety of text-based tasks. These methods without a doubt can include private or otherwise personally identifiable information. As such, the question of privacy in NLP has gained fervor in recent years, coinciding with the development of new Privacy-Enhancing Technologies (PETs). Among these PETs, Differential Privacy boasts several desirable qualities in the conversation surrounding data privacy. Naturally, the question becomes whether Differential Privacy is applicable in the largely unstructured realm of NLP. This topic has sparked novel research, which is unified in one basic goal: how can one adapt Differential Privacy to NLP methods? This paper aims to summarize the vulnerabilities addressed by Differential Privacy, the current thinking, and above all, the crucial next steps that must be considered.
翻译:随着大数据浪潮继续影响自然语言处理(NLP)的景观,现代NLP方法的使用以这些数据为基础,以便处理各种基于文本的任务。这些方法无疑可以包括私人信息或其他个人可识别的信息。因此,近年来,NLP的隐私问题随着新的隐私增强技术的发展而变得激烈。在这些PET中,差异隐私在围绕数据隐私的对话中具有一些可取的品质。自然,问题在于差异隐私是否适用于基本上没有结构的NLP领域。这个专题引发了新的研究,在一个基本目标中是统一的:如何使差异隐私与NLP方法相适应?本文旨在总结差异隐私所处理的脆弱性,目前的思维,尤其是必须考虑的关键的下一步步骤。