Leveraging transfer learning has recently been shown to be an effective strategy for training large models with Differential Privacy (DP). Moreover, somewhat surprisingly, recent works have found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies largely rely on algorithms like DP-SGD for training large models, in the specific case of privately learning from features, we observe that computational burden is low enough to allow for more sophisticated optimization schemes, including second-order methods. To that end, we systematically explore the effect of design parameters such as loss function and optimization algorithm. We find that, while commonly used logistic regression performs better than linear regression in the non-private setting, the situation is reversed in the private setting. We find that linear regression is much more effective than logistic regression from both privacy and computational aspects, especially at stricter epsilon values ($\epsilon < 1$). On the optimization side, we also explore using Newton's method, and find that second-order information is quite helpful even with privacy, although the benefit significantly diminishes with stricter privacy guarantees. While both methods use second-order information, least squares is effective at lower epsilons while Newton's method is effective at larger epsilon values. To combine the benefits of both, we propose a novel algorithm called DP-FC, which leverages feature covariance instead of the Hessian of the logistic regression loss and performs well across all $\epsilon$ values we tried. With this, we obtain new SOTA results on ImageNet-1k, CIFAR-100 and CIFAR-10 across all values of $\epsilon$ typically considered. Most remarkably, on ImageNet-1K, we obtain top-1 accuracy of 88\% under (8, $8 * 10^{-7}$)-DP and 84.3\% under (0.1, $8 * 10^{-7}$)-DP.
翻译:利用转移学习最近被证明是用差异隐私(DP)培训大型模型的有效战略。 此外,有些令人惊讶的是,最近的一些工程发现,私人培训只是培训前模式的最后一层才提供了最佳的功能。虽然过去的研究主要依赖DP-SGD这样的算法来培训大型模型,在私下学习功能的具体情况下,我们发现计算负担已经过低,足以进行更复杂的优化计划,包括第二阶方法。为此,我们系统地探索了设计参数的影响,如损失的图像函数和优化算法。我们发现,虽然通常使用的物流回归比非私人环境下的线性回归要好,但私人环境中的情况却相反。我们发现,线性回归比从隐私和计算方面(特别是更严格的epsilon 值) (=1美元) 还要有效。在优化方面,我们还探索了牛顿方法,并且发现第二阶值信息在隐私方面很有帮助,但随着更严格的隐私保障,我们使用第二阶值(DRER) 的二阶值比值比值都大。