In recent years, formal methods of privacy protection such as differential privacy (DP), capable of deployment to data-driven tasks such as machine learning (ML), have emerged. Reconciling large-scale ML with the closed-form reasoning required for the principled analysis of individual privacy loss requires the introduction of new tools for automatic sensitivity analysis and for tracking an individual's data and their features through the flow of computation. For this purpose, we introduce a novel \textit{hybrid} automatic differentiation (AD) system which combines the efficiency of reverse-mode AD with an ability to obtain a closed-form expression for any given quantity in the computational graph. This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data. We demonstrate our approach by analysing the individual DP guarantees of statistical database queries. Moreover, we investigate the application of our technique to the training of DP neural networks. Our approach can enable the principled reasoning about privacy loss in the setting of data processing, and further the development of automatic sensitivity analysis and privacy budgeting systems.
翻译:近年来,出现了一些正式的隐私保护方法,如差异隐私(DP),能够用于诸如机器学习(ML)等数据驱动的任务;将大规模ML与对个人隐私损失进行原则性分析所需的封闭式推理相协调,需要采用新的工具进行自动敏感性分析,并通过计算流程跟踪个人数据及其特征;为此,我们引入了一个新型的\textit{hybrid}自动区分(AD)系统,该系统将反模式AD的效率与在计算图表中获取任何特定数量的封闭式表达方式的能力结合起来,从而能够模拟任意不同功能构成的敏感性,例如对私人数据神经网络的培训。我们通过分析单个DP对统计数据库查询的保证来展示我们的方法。此外,我们调查了我们技术在培训DP神经网络中的应用情况。我们的方法可以使关于隐私损失的原则推理在确定数据处理过程中得以实现,并进一步发展自动敏感性分析和隐私预算编制系统。