Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained 200-layer Normalizer-Free ResNet, we achieve a remarkable 77.1% top-1 accuracy on ImageNet under (1, 8*10^{-7})-DP, and achieve 81.1% under (8, 8*10^{-7})-DP. This markedly exceeds the previous SOTA of 47.9% under a larger privacy budget of (10, 10^{-6})-DP. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.
翻译:差异隐私(DP)提供了一种正式的隐私保障,防止使用机器学习模型的对手获得关于个别培训点的信息。不同的是,最受欢迎的DP培训方法 — — 私自静脉冲(DP-SGD)通过在培训期间注入噪音来实现这种保护。然而,以前的工作发现,DP-SGD常常导致标准图像分类基准的性能显著下降。此外,一些作者假设DP-SGD在大型模型上的表现本身表现不佳,因为维护隐私所需的噪音规范与模型的尺寸成比例。相比之下,我们证明,在超分度模型上的DP-SGD能够比以前想象的要好得多。将谨慎的超参数与简单的技术相结合,以确保信号传播和改善趋同率。尽管我们以前的工作发现DP-SGD常常导致标准图像分类基准下81.4%(8,10,5)-DP-DP的性能大幅下降。当对经过训练的200级正常-自由 ResNet前的分类进行微调时,我们实现了一个惊人的77.1%至7的高级参数调整,在8-10的SO(10)下,在10级图像网络上实现了一个显著的机密(10),在8-10的机密预算下,在10的关闭和8-10的高级-10的精确度(10),在10)下,在8-10的10的10级的高级的10级(10)下,在10级),在10级(10级(10)下,在8-级(10级)下,在8-级/级)下,在8-级/级)下,在8-级/级/级。