Applying Differentially Private Stochastic Gradient Descent (DPSGD) to training modern, large-scale neural networks such as transformer-based models is a challenging task, as the magnitude of noise added to the gradients at each iteration scales with model dimension, hindering the learning capability significantly. We propose a unified framework, $\textsf{LSG}$, that fully exploits the low-rank and sparse structure of neural networks to reduce the dimension of gradient updates, and hence alleviate the negative impacts of DPSGD. The gradient updates are first approximated with a pair of low-rank matrices. Then, a novel strategy is utilized to sparsify the gradients, resulting in low-dimensional, less noisy updates that are yet capable of retaining the performance of neural networks. Empirical evaluation on natural language processing and computer vision tasks shows that our method outperforms other state-of-the-art baselines.
翻译:应用差异式私人软体渐变源(DPSGD)来培训现代化的大型神经网络(如以变压器为基础的模型)是一项艰巨的任务,因为每个迭代尺度带有模型尺寸的梯度上噪音的强度增加,大大阻碍了学习能力。我们提议了一个统一的框架,即$\ textsf{LSG}$,充分利用神经网络中低位和稀疏的结构来减少梯度更新的维度,从而减轻DPSGD的负面影响。 梯度更新首先与一对低位矩阵相近。 然后,采用了一种新颖的战略来填补梯度,导致低维度、不那么吵的更新,仍然能够保留神经网络的性能。关于自然语言处理和计算机视觉任务的实证性评估表明,我们的方法超越了其他最先进的基线。