Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Training models with DP protects the model against leakage of sensitive data in a potentially adversarial setting. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients, such that the trained model weights become nearly independent of the use any particular training example. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training. This is further exacerbated by the fact that increasing the number of parameters leads to larger degradation in utility with DP. In this work, we zoom in on the ImageNet dataset and demonstrate that similar to the non-private case, pre-training over-parameterized models on a large public dataset can lead to substantial gains when the model is finetuned privately. Moreover, by systematically comparing private and non-private models across a range of huge batch sizes, we find that similar to non-private setting, choice of optimizer can further improve performance substantially with DP. By switching from DP-SGD to DP-LAMB we saw improvement of up to 20$\%$ points (absolute). Finally, we show that finetuning just the last layer for a \emph{single step} in the full batch setting leads to both SOTA results of 81.7 $\%$ under a wide privacy budget range of $\epsilon \in [4, 10]$ and $\delta$ = $10^{-6}$ while minimizing the computational overhead substantially.
翻译:不同隐私(DP)为培训机器学习模式提供了一个正式框架,并有个人实例的隐私。与DP培训模式相比,在潜在对抗性环境下保护敏感数据的泄漏模式。在深层次学习领域,差异私人斯托切切梯底(DP-SGD)已经成为一种受欢迎的私人培训算法。使用DP-SGD的私人培训可以防止因将噪音注入单个示例梯度而渗漏,这样,经过培训的隐私模式加权数几乎可以独立于任何特定培训实例的使用。虽然这一结果相当令人信服,但与DP-SGD培训大规模模型培训的计算成本比非私人培训要高得多。在深层次学习领域,差异性参数数量的增加导致与DP(DP-SG)的使用出现更大的退化。在这项工作中,我们放大了图像网络数据集,并表明与非私人案例相似,在大型公共数据集上培训过度的超度模型在私下调整时可以带来显著的收益。此外,通过系统比较大型规模的私人和非私人模型,比非私人模型要高得多。我们发现,在大幅的SOD-D的升级中,我们看到与O-D-D的性业绩的选择范围与D-SL最后的升级是相同的。