Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained NFNet-F3, we achieve a remarkable 83.8% top-1 accuracy on ImageNet under (0.5, 8*10^{-7})-DP. Additionally, we also achieve 86.7% top-1 accuracy under (8, 8 \cdot 10^{-7})-DP, which is just 4.3% below the current non-private SOTA for this task. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.
翻译:差异隐私(DP) 提供了正式的隐私保障, 防止使用机器学习模型的对手获得关于个别培训点的信息。 差异化的私人Stochatistic Gradientle Broad(DP-SGD),这是最受欢迎的深层次学习的DP- SGD培训方法,通过在培训期间注入噪音来实现这种保护。 但是,先前的工作发现,DP-SGD常常导致标准图像分类基准的性能显著下降。 此外,一些作者假设DP-SGD在大型模型上的表现本身不尽如人意,因为维护隐私所需的噪音规范与模型的尺寸成比例成比例。 相比之下,我们表明,在过度参数化模型上的DP- SGD(DP- DP- DP) 能够比以前想象的要好得多。 将谨慎的超参数与简单的技术结合起来,以确保信号传播和改善聚合率。 然而,DP- SOTA- 10没有额外的数据,使用40级宽度的宽度网络,比以前的SOTA(SO-D) 改进了71. 。 当微调前的NF- F3 之间的差距时, 我们的NF- 3, 我们的不精确化的当前8.8.8.8.8.8- 10* 和不精确度达到最高的图像 10) 在10 下我们最高级的图像任务下的不精确度。