面向不同隐私的图像网络规模培训 (Toward Training at ImageNet Scale with Differential Privacy)

Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set out to investigate how to do this, using ImageNet image classification as a poster example of an ML task that is very challenging to resolve accurately with DP right now. This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale. We show approaches which help to make DP training faster, as well as model types and settings of the training process that tend to work better for DP. Combined, the methods we discuss let us train a Resnet-18 with differential privacy to 47.9% accuracy and privacy parameters $\epsilon = 10, \delta = 10^{-6}$, a significant improvement over "naive" DP-SGD training of Imagenet models but a far cry from the $75\%$ accuracy that can be obtained by the same network without privacy. We share our code at https://github.com/google-research/dp-imagenet calling for others to join us in moving the needle further on DP at scale.

翻译：不同隐私(DP)是培训机器学习模式(包括神经网络)的实际标准,同时确保培训中个人例子的隐私。尽管在如何以不同的隐私培训ML模式方面有大量文献,但培训真实生活、大型神经网络,既具有合理的准确性和隐私,也具有合理的准确性和隐私性,仍然极具挑战性。我们准备调查如何做到这一点,使用图像网络图像图像分类作为ML任务的一个招贴例子,而ML任务现在要与DP准确地解决非常困难。本文件分享了我们工作的初步经验教训,希望它能激励和告知其他研究人员探索大规模DP培训的隐私。我们展示了有助于使DP培训速度加快的方法,以及培训过程的模型类型和设置,这些培训过程往往能够为DP提供更好的服务。我们讨论的方法让我们用不同隐私为47.9%的准确性和隐私参数来培训Resnet-18 $\epsilon = 10,\ delta = 10 ⁇ -6}。本文分享了“naive” DP-SGD培训图像网络模型的显著改进,但远离我们75 $$的精确度,这是我们在DP_broglemental com lax laus in shamentaling shales in shales in shales laus in shales shalesmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmentalmental

相关内容

ImageNet (数据集)

关注 21

ImageNet项目是一个用于视觉对象识别软件研究的大型可视化数据库。超过1400万的图像URL被ImageNet手动注释，以指示图片中的对象;在至少一百万个图像中，还提供了边界框。ImageNet包含2万多个类别; [2]一个典型的类别，如“气球”或“草莓”，包含数百个图像。第三方图像URL的注释数据库可以直接从ImageNet免费获得;但是，实际的图像不属于ImageNet。自2010年以来，ImageNet项目每年举办一次软件比赛，即ImageNet大规模视觉识别挑战赛（ILSVRC），软件程序竞相正确分类检测物体和场景。 ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表。2012年在解决ImageNet挑战方面取得了巨大的突破，被广泛认为是2010年的深度学习革命的开始。

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【阿姆斯特丹大学深度学习课程】《UvA Deep Learning Course》，阿姆斯特丹大学助理教授| Efstratios Gavves

专知会员服务

20+阅读 · 2020年1月23日