As privacy gains traction in the NLP community, researchers have started adopting various approaches to privacy-preserving methods. One of the favorite privacy frameworks, differential privacy (DP), is perhaps the most compelling thanks to its fundamental theoretical guarantees. Despite the apparent simplicity of the general concept of differential privacy, it seems non-trivial to get it right when applying it to NLP. In this short paper, we formally analyze several recent NLP papers proposing text representation learning using DPText (Beigi et al., 2019a,b; Alnasser et al., 2021; Beigi et al., 2021) and reveal their false claims of being differentially private. Furthermore, we also show a simple yet general empirical sanity check to determine whether a given implementation of a DP mechanism almost certainly violates the privacy loss guarantees. Our main goal is to raise awareness and help the community understand potential pitfalls of applying differential privacy to text representation learning.
翻译:作为国家隐私方案社区的隐私增益牵引力,研究人员已开始采用各种方法来保护隐私,其中最受欢迎的隐私框架之一,即不同的隐私(DP),由于它的基本理论保障,也许是最令人信服的。尽管区别隐私的一般概念显然简单,但在将这一概念适用于国家隐私方案时,似乎并不难正确地加以正确。在这个简短的论文中,我们正式分析了最近几个国家隐私方案的文件,其中建议采用DPText(Beigi等人,2019a,b;Alnasser等人,2021年;Beigi等人,2021年)来说明他们关于区别对待的虚假主张。此外,我们还展示了简单而一般的经验性智慧检查,以确定是否对特定实施个人保护方案机制几乎肯定违反了隐私损失保障。我们的主要目标是提高认识,帮助社区了解将差异隐私用于文本代表学习的潜在缺陷。