We consider the privacy-preserving machine learning (ML) setting where the trained model must satisfy differential privacy (DP) with respect to the labels of the training examples. We propose two novel approaches based on, respectively, the Laplace mechanism and the PATE framework, and demonstrate their effectiveness on standard benchmarks. While recent work by Ghazi et al. proposed Label DP schemes based on a randomized response mechanism, we argue that additive Laplace noise coupled with Bayesian inference (ALIBI) is a better fit for typical ML tasks. Moreover, we show how to achieve very strong privacy levels in some regimes, with our adaptation of the PATE framework that builds on recent advances in semi-supervised learning. We complement theoretical analysis of our algorithms' privacy guarantees with empirical evaluation of their memorization properties. Our evaluation suggests that comparing different algorithms according to their provable DP guarantees can be misleading and favor a less private algorithm with a tighter analysis.
翻译:我们考虑了保密机器学习(ML)的设置,在这种设置中,经过培训的模型必须满足培训范例标签上的不同隐私(DP),我们建议了两种新办法,分别基于Laplace机制和PATE框架,并展示了标准基准的有效性。Ghazi等人最近根据随机反应机制提出的Label DP计划,我们争辩说,添加的Laplace噪音加上巴伊西亚推论(ALIBI)更适合典型的ML任务。此外,我们展示了如何在某些制度中实现非常强的隐私水平,我们根据半监督学习的最新进展对PATE框架进行了调整。我们对我们的算法的隐私保障进行了理论分析,同时对其记忆化特性进行了经验性评估。我们的评估表明,将不同的算法与其可行的DP保证相比较,可能会产生误导,并有利于采用更严格分析的不那么私人的算法。