All state-of-the-art (SOTA) differentially private machine learning (DP ML) methods are iterative in nature, and their privacy analyses allow publicly releasing the intermediate training checkpoints. However, DP ML benchmarks, and even practical deployments, typically use only the final training checkpoint to make predictions. In this work, for the first time, we comprehensively explore various methods that aggregate intermediate checkpoints to improve the utility of DP training. Empirically, we demonstrate that checkpoint aggregations provide significant gains in the prediction accuracy over the existing SOTA for CIFAR10 and StackOverflow datasets, and that these gains get magnified in settings with periodically varying training data distributions. For instance, we improve SOTA StackOverflow accuracies to 22.7% (+0.43% absolute) for $\epsilon=8.2$, and 23.84% (+0.43%) for $\epsilon=18.9$. Theoretically, we show that uniform tail averaging of checkpoints improves the empirical risk minimization bound compared to the last checkpoint of DP-SGD. Lastly, we initiate an exploration into estimating the uncertainty that DP noise adds in the predictions of DP ML models. We prove that, under standard assumptions on the loss function, the sample variance from last few checkpoints provides a good approximation of the variance of the final model of a DP run. Empirically, we show that the last few checkpoints can provide a reasonable lower bound for the variance of a converged DP model.
翻译:在这项工作中,我们首次全面探索了各种综合中间检查站的方法,以提高DP培训的效用。我们生动地表明,检查站集合在预测现有CIFAR10和StackOverproll数据集的SOTA的差值准确性方面有很大的改进,而其隐私分析允许公开公布中级培训检查站。然而,DP ML基准,甚至实际部署,通常只使用最后培训检查站作出预测。在这项工作中,我们首次全面探索了各种方法,将中间检查站集中起来,以提高DP培训的效用。我们同时表明,与现有的CIFAR10和StackOverplow数据集相比,现有SOTA的差值的预测准确性有了显著的提高,而且这些收益在定期不同培训数据分布的环境中得到放大。例如,我们把SOITA StackOververcuration cccurations cloities to 22.7% (+0.43%绝对值) 提高到22.7% (+0.43% 绝对值),而23.84% (+0.43%) 用于提高DP培训的效用。理论上,我们表明,统一的检查站的统一差差比DP-SGD最后检查站的最小的风险最小化风险最小化。最后模型可以证明我们根据几个模型进行估计。