当有差异的隐私与NLP相遇时:魔鬼在细节中 (When differential privacy meets NLP: The devil is in the detail)

Differential privacy provides a formal approach to privacy of individuals. Applications of differential privacy in various scenarios, such as protecting users' original utterances, must satisfy certain mathematical properties. Our contribution is a formal analysis of ADePT, a differentially private auto-encoder for text rewriting (Krishna et al, 2021). ADePT achieves promising results on downstream tasks while providing tight privacy guarantees. Our proof reveals that ADePT is not differentially private, thus rendering the experimental results unsubstantiated. We also quantify the impact of the error in its private mechanism, showing that the true sensitivity is higher by at least factor 6 in an optimistic case of a very small encoder's dimension and that the amount of utterances that are not privatized could easily reach 100% of the entire dataset. Our intention is neither to criticize the authors, nor the peer-reviewing process, but rather point out that if differential privacy applications in NLP rely on formal guarantees, these should be outlined in full and put under detailed scrutiny.

翻译：个人隐私的不同性为个人隐私提供了一种正式的处理方式。在保护用户最初的言词等各种情景中应用不同的隐私,必须满足某些数学特性。我们的贡献是对ADePT进行正式分析,ADEPT是用于文字改写的不同私人自动编码器(Krishna等人,2021年)。ADEPT在提供严格的隐私保障的同时,在下游任务上取得了有希望的成果。我们的证据表明,ADEPT并不是有区别的私人的,因此实验结果是没有根据的。我们还量化了该错误在其私人机制中的影响,表明在非常小的编码器的层面乐观的情况下,真正的敏感度至少比6系数高,而非私营化的言词量可能很容易达到整个数据集的100%。我们的意图既不批评作者,也不批评同行审查程序,而是指出,如果NLP的差别性隐私应用程序依赖正式的保证,那么这些应用就应该完整地加以概述和详细审查。