Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set out to investigate how to do this, using ImageNet image classification as a poster example of an ML task that is very challenging to resolve accurately with DP right now. This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale. We show approaches which help to make DP training faster, as well as model types and settings of the training process that tend to work better for DP. Combined, the methods we discuss let us train a Resnet-18 with differential privacy to 47.9% accuracy and privacy parameters $\epsilon = 10, \delta = 10^{-6}$, a significant improvement over "naive" DP-SGD training of Imagenet models but a far cry from the $75\%$ accuracy that can be obtained by the same network without privacy. We share our code at https://github.com/google-research/dp-imagenet calling for others to join us in moving the needle further on DP at scale.
翻译:不同隐私(DP)是培训机器学习模式(包括神经网络)的实际标准,同时确保培训中个人例子的隐私。尽管在如何以不同的隐私培训ML模式方面有大量文献,但培训真实生活、大型神经网络,既具有合理的准确性和隐私,也具有合理的准确性和隐私性,仍然极具挑战性。我们准备调查如何做到这一点,使用图像网络图像图像分类作为ML任务的一个招贴例子,而ML任务现在要与DP准确地解决非常困难。本文件分享了我们工作的初步经验教训,希望它能激励和告知其他研究人员探索大规模DP培训的隐私。我们展示了有助于使DP培训速度加快的方法,以及培训过程的模型类型和设置,这些培训过程往往能够为DP提供更好的服务。我们讨论的方法让我们用不同隐私为47.9%的准确性和隐私参数来培训Resnet-18 $\epsilon = 10,\ delta = 10 ⁇ -6}。本文分享了“naive” DP-SGD培训图像网络模型的显著改进,但远离我们75 $$的精确度,这是我们在DP_broglemental com lax laus in shamentaling shales in shales in shales laus in shales shalesmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmental