Differentially private stochastic gradient descent (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy, which requires gradient clipping to bound the maximum norm of individual gradients and additive isotropic Gaussian noise. With analysis of the convergence rate of DP-SGD in a non-convex setting, we reveal that randomly sparsifying gradients before clipping and noisification adjusts a trade-off between internal components of the convergence bound and leads to a smaller upper bound when the noise is dominant. Additionally, our theoretical analysis and extensive empirical evaluations show that the trade-off is not trivial but possibly a unique property of DP-SGD, as either canceling noisification or gradient clipping removes the trade-off in the bound. Based on the analysis, we propose an efficient and lightweight approach of random sparsification (RS) for DP-SGD. Applying RS across various DP-SGD frameworks improves performance, while the produced sparse gradients of RS exhibit advantages in reducing communication cost and strengthening security against reconstruction attacks, which are also key problems in private machine learning.
翻译:在深层学习中广泛采用私人差异性梯度梯度下沉(DP-SGD),以提供严格定义的隐私,这要求梯度剪切,以约束个别梯度和添加异地高斯噪音的最大规范。通过分析非convex环境下DP-SGD的趋同率,我们发现,在剪切和注解之前随机擦拭梯度调整了趋同约束内各组成部分之间的权衡,导致在噪音占支配地位时缩小上限。此外,我们的理论分析和广泛的经验评估表明,这种交换并非微不足道,但可能是DP-SGD的独特特性,因为取消记名或梯度剪切消除约束中的交易率。根据分析,我们提议对DP-SGD采用有效和轻量度的随机喷雾法。在DP-SGD的各种框架中应用RS改进了业绩,而生成的斯普斯卡共和国稀薄梯度在降低通信成本和加强安全以防范重建攻击方面表现出优势,而后者也是私人机器学习的关键问题。