ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP to real world complex ML models are still few and far between. The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models. Tricks for tuning and maximizing performance are scattered among papers or stored in the heads of practitioners. Furthermore, the literature seems to present conflicting evidence on how and whether to apply architectural adjustments and which components are "safe" to use with DP. This work is a self-contained guide that gives an in-depth overview of the field of DP ML and presents information about achieving the best possible DP ML model with rigorous privacy guarantees. Our target audience is both researchers and practitioners. Researchers interested in DP for ML will benefit from a clear overview of current advances and areas for improvement. We include theory-focused sections that highlight important topics such as privacy accounting and its assumptions, and convergence. For a practitioner, we provide a background in DP theory and a clear step-by-step guide for choosing an appropriate privacy definition and approach, implementing DP training, potentially updating the model architecture, and tuning hyperparameters. For both researchers and practitioners, consistently and fully reporting privacy guarantees is critical, and so we propose a set of specific best practices for stating guarantees.
翻译:在现实世界应用中,ML模式无处不在,始终是研究的焦点。与此同时,社区已开始认识到保护ML培训数据的隐私的重要性。差异隐私(DP)已经成为正式声明数据匿名化的黄金标准;然而,虽然在行业中出现了一些采用DP的尝试,但试图将DP应用于真实世界复杂的ML模式的尝试仍然很少,而且相距甚远。DP的通过受到以下因素的阻碍:对DP保护所涉内容的实际指导有限,对ML模型的隐私保障目标是什么,实现良好的隐私使用权交换交易的难度。为ML模型实现良好的隐私使用权使用权交易。调适和最大化业绩的弊端分散在论文中或储存在业者负责人中。此外,文献似乎提供了相互矛盾的证据,说明如何和是否应用DP调整,哪些组成部分“安全”与DP。这项工作是一个自成一体的指南,它提供了对DP ML领域有严格隐私保障的最佳模式,并介绍了实现最佳的DP ML模式的信息。我们的目标受众是不断更新的背景文件和从业者,研究者们提出了对当前的理论的精细的精细的精细的精细的理论,我们为D的理论的理论的精细的精细的精细的理论的精细的精细的精细的精细的理论的精细的精细的理论,我们为D。</s>