Text rewriting with differential privacy (DP) provides concrete theoretical guarantees for protecting the privacy of individuals in textual documents. In practice, existing systems may lack the means to validate their privacy-preserving claims, leading to problems of transparency and reproducibility. We introduce DP-Rewrite, an open-source framework for differentially private text rewriting which aims to solve these problems by being modular, extensible, and highly customizable. Our system incorporates a variety of downstream datasets, models, pre-training procedures, and evaluation metrics to provide a flexible way to lead and validate private text rewriting research. To demonstrate our software in practice, we provide a set of experiments as a case study on the ADePT DP text rewriting system, detecting a privacy leak in its pre-training approach. Our system is publicly available, and we hope that it will help the community to make DP text rewriting research more accessible and transparent.
翻译:以不同隐私重写文本(DP)为在文本文件中保护个人隐私提供了具体的理论保障。在实践中,现有系统可能缺乏验证其隐私保护主张的手段,从而导致透明度和可复制性问题。我们引入了DP-Rewrite,这是一个开放源码框架,用于不同私人文本重写,目的是通过模块化、可扩展和高度定制来解决这些问题。我们的系统包含各种下游数据集、模型、培训前程序和评价指标,以提供灵活的方式引导和验证私人文本重写研究。为了在实践中展示我们的软件,我们提供了一套实验,作为ADEPT DP文本重写系统的案例研究,发现其培训前方法的隐私漏洞。我们的系统是公开的,我们希望它将有助于社区使DP文本重写研究更加方便和透明。