We propose a reparametrization scheme to address the challenges of applying differentially private SGD on large neural networks, which are 1) the huge memory cost of storing individual gradients, 2) the added noise suffering notorious dimensional dependence. Specifically, we reparametrize each weight matrix with two \emph{gradient-carrier} matrices of small dimension and a \emph{residual weight} matrix. We argue that such reparametrization keeps the forward/backward process unchanged while enabling us to compute the projected gradient without computing the gradient itself. To learn with differential privacy, we design \emph{reparametrized gradient perturbation (RGP)} that perturbs the gradients on gradient-carrier matrices and reconstructs an update for the original weight from the noisy gradients. Importantly, we use historical updates to find the gradient-carrier matrices, whose optimality is rigorously justified under linear regression and empirically verified with deep learning tasks. RGP significantly reduces the memory cost and improves the utility. For example, we are the first able to apply differential privacy on the BERT model and achieve an average accuracy of $83.9\%$ on four downstream tasks with $\epsilon=8$, which is within $5\%$ loss compared to the non-private baseline but enjoys much lower privacy leakage risk.
翻译:我们提出一个再平衡计划,以应对在大型神经网络上应用有差别的私人 SGD 的挑战,即:(1) 存储单个梯度的记忆成本巨大,(2) 增加噪音,并造成臭名昭著的维度依赖性。具体地说,我们用两个小尺寸的 emph{gradient-carrier 矩阵和一个 emph{residual weight} 矩阵对每个重量矩阵进行重新平衡。我们认为,这种再平衡使前向/后向进程保持不变,同时使我们能够在不计算梯度本身的情况下计算预测梯度。为了以不同的隐私权学习,我们设计了 \ emph{ reparmedrized 梯度渗透性(RGP) }, 使梯度-carrier 矩阵的梯度受到渗透, 并重建从噪音梯度梯度中原重量的更新。 重要的是,我们利用历史更新来找到梯度- 梯度矩阵, 其最佳性在线性回归和经经验验证的深层学习任务中非常合理。RGPP 大大降低了记忆成本并改进了效用。例如, 我们第一次能够将差异隐私权应用到 $ $ $ 的基底值 的基底值 。