When applied to large-scale learning problems, the conventional wisdom on privacy-preserving deep learning, known as Differential Private Stochastic Gradient Descent (DP-SGD), has met with limited success due to significant performance degradation and high memory overhead when compared to the non-privacy counterpart. We show how to mitigate the performance drop by replacing the DP-SGD with a novel DP Forward-Propagation (DP-FP) followed by an off-the-shelf non-DP optimizer. Our DP-FP employs novel (1) representation clipping followed by noise addition in the forward propagation stage, as well as (2) micro-batch construction via subsampling to achieve DP amplification and reduce noise power to $1/M$, where $M$ is the number of micro-batch in a step. When training a classification model, our DP-FP with all of the privacy-preserving operations on the representation is innately free of gradient bias, total noise proportionally to model size, and memory issues in DP-SGD. As a result, our DP-FP outperforms cutting-edge DP-SGD while retaining the same level of privacy, and it approaches non-private baselines and significantly outperforms state-of-the-art DP-SGD variants. When applied to RoBERTa-large on four downstream tasks, for example, DP-FP achieves an average accuracy of 91.34\% with privacy budgets less than 3, representing a 3.81\% performance improvement over the state-of-the-art DP-SGD and only a 0.9\% loss compared to the non-private baseline but with a significantly lower privacy leakage risk.
翻译:在应用到大规模学习问题时,关于隐私保护的传统智慧,即所谓的“差异私家私家私家Stochistic Gradient Emple”(DP-SGD),由于业绩严重退化,而且与非私家对应方相比,记忆管理管理费用较高,因此取得了有限的成功。我们展示了如何通过以新的DP-SGD(DP-PROP)取代DP-SGD(DP-FP),继之而以非现成的非DP-FP优化者取代DP-FP(DP-FP),从而减轻业绩下降。我们的DP-FP采用了新的(1) 代表制剪贴,随后又在前传播阶段添加噪音,以及(2) 通过子采样进行微型批,实现DP-SGD(DP-SGD)增缩缩缩缩缩,将噪音能力降低到1/M$(M$),在一步中,微缩缩缩缩放-SGD(RSG-D)预算大幅提升了DP-SG-D(RS-SG-SG-D)的缩略略图,在最低预算中比标准水平上大幅保留了DP-SG-SG-SG-SG-SG-SG-SG-SG-SG-SG-del-del-del-B-del-FA-B-Ixxxxxxxxxxxxxxxx。