差别私人图像分类大规模转移学习 (Large Scale Transfer Learning for Differentially Private Image Classification)

Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. Training models with DP protects the model against leakage of sensitive data in a potentially adversarial setting. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Private training using DP-SGD protects against leakage by injecting noise into individual example gradients, such that the trained model weights become nearly independent of the use any particular training example. While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training. This is further exacerbated by the fact that increasing the number of parameters leads to larger degradation in utility with DP. In this work, we zoom in on the ImageNet dataset and demonstrate that similar to the non-private case, pre-training over-parameterized models on a large public dataset can lead to substantial gains when the model is finetuned privately. Moreover, by systematically comparing private and non-private models across a range of huge batch sizes, we find that similar to non-private setting, choice of optimizer can further improve performance substantially with DP. By switching from DP-SGD to DP-LAMB we saw improvement of up to 20$\%$ points (absolute). Finally, we show that finetuning just the last layer for a \emph{single step} in the full batch setting leads to both SOTA results of 81.7 $\%$ under a wide privacy budget range of $\epsilon \in [4, 10]$ and $\delta$ = $10^{-6}$ while minimizing the computational overhead substantially.

翻译：不同隐私(DP)为培训机器学习模式提供了一个正式框架,并有个人实例的隐私。与DP培训模式相比,在潜在对抗性环境下保护敏感数据的泄漏模式。在深层次学习领域,差异私人斯托切切梯底(DP-SGD)已经成为一种受欢迎的私人培训算法。使用DP-SGD的私人培训可以防止因将噪音注入单个示例梯度而渗漏,这样,经过培训的隐私模式加权数几乎可以独立于任何特定培训实例的使用。虽然这一结果相当令人信服,但与DP-SGD培训大规模模型培训的计算成本比非私人培训要高得多。在深层次学习领域,差异性参数数量的增加导致与DP(DP-SG)的使用出现更大的退化。在这项工作中,我们放大了图像网络数据集,并表明与非私人案例相似,在大型公共数据集上培训过度的超度模型在私下调整时可以带来显著的收益。此外,通过系统比较大型规模的私人和非私人模型,比非私人模型要高得多。我们发现,在大幅的SOD-D的升级中,我们看到与O-D-D的性业绩的选择范围与D-SL最后的升级是相同的。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日