A major direction in differentially private machine learning is differentially private fine-tuning: pretraining a model on a source of "public data" and transferring the extracted features to downstream tasks. This is an important setting because many industry deployments fine-tune publicly available feature extractors on proprietary data for downstream tasks. In this paper, we use features extracted from state-of-the-art open source models to solve benchmark tasks in computer vision and natural language processing using differentially private fine-tuning. Our key insight is that by accelerating training, we can quickly drive the model parameters to regions in parameter space where the impact of noise is minimized. In doing so, we recover the same performance as non-private fine-tuning for realistic values of epsilon in [0.01, 1.0] on benchmark image classification datasets including CIFAR100.
翻译:差别化私人机器学习的主要方向是差别化私人微调:对“公共数据”来源的模型进行预先培训,并将抽取的特征转移到下游任务。 这是一个重要环境,因为许多行业在下游任务方面对专有性数据进行微调,对公开提供的特有性提取器进行微调。在本文中,我们使用从最先进的开放源模型中提取的特征,利用差别化私人微调,解决计算机视觉和自然语言处理的基准任务。我们的主要见解是,通过加快培训,我们可以迅速将模型参数参数推向参数空间中噪音影响最小的地区。在这样做的时候,我们恢复了在基准图像分类数据集(包括CIFAR100)上对[0.01,1.0] 普西隆现实值的非私人微调的相同性能。