深层学习与不同隐私的融合 (On the Convergence of Deep Learning with Differential Privacy)

In deep learning with differential privacy (DP), the neural network achieves the privacy usually at the cost of slower convergence (and thus lower performance) than its non-private counterpart. This work gives the first convergence analysis of the DP deep learning, through the lens of training dynamics and the neural tangent kernel (NTK). Our convergence theory successfully characterizes the effects of two key components in the DP training: the per-sample clipping (flat or layerwise) and the noise addition. Our analysis not only initiates a general principled framework to understand the DP deep learning with any network architecture and loss function, but also motivates a new clipping method -- the global clipping, that significantly improves the convergence while preserving the same privacy guarantee as the existing local clipping. In terms of theoretical results, we establish the precise connection between the per-sample clipping and NTK matrix. We show that in the gradient flow, i.e., with infinitesimal learning rate, the noise level of DP optimizers does not affect the convergence. We prove that DP gradient descent (GD) with global clipping guarantees the monotone convergence to zero loss, which can be violated by the existing DP-GD with local clipping. Notably, our analysis framework easily extends to other optimizers, e.g., DP-Adam. Empirically speaking, DP optimizers equipped with global clipping perform strongly on a wide range of classification and regression tasks. In particular, our global clipping is surprisingly effective at learning calibrated classifiers, in contrast to the existing DP classifiers which are oftentimes over-confident and unreliable. Implementation-wise, the new clipping can be realized by adding one line of code into the Opacus library.

翻译：在以不同隐私的深度学习(DP)中,神经网络实现了隐私,其成本通常比非私人网络更低(因此性能更低),其成本通常比非私人网络更低。这项工作通过培训动态和神经相调内核(NTK)的透镜,对DP深层学习进行了第一次趋同分析。我们的趋同理论成功地描述了DP培训中两个关键组成部分的效果:每份抽样剪报(表或层)和添加噪音。我们的分析不仅启动了一个一般性原则框架,以了解DP与任何网络架构和损失函数的深度学习,而且还激励了一种新的剪辑方法 -- -- 全球剪贴,大大改进了对DP深层学习的趋同,同时保留了与现有本地剪辑相同的隐私保障。在理论结果方面,我们建立了每份抽样剪辑和NTTK矩阵之间的精确联系。我们在梯度流中显示,即,以微量的学习速度,最优化的DP的节流不会影响现有的趋同。我们证明,在一则全球的斜度递增(GD)中,在一则保证了单质的趋近的趋近的趋同,在当前的方向上进行着了方向的分析。