Traditionally, the random noise is equally injected when training with different data instances in the field of differential privacy (DP). In this paper, we first give sharper excess risk bounds of DP stochastic gradient descent (SGD) method. Considering most of the previous methods are under convex conditions, we use Polyak-{\L}ojasiewicz condition to relax it in this paper. Then, after observing that different training data instances affect the machine learning model to different extent, we consider the heterogeneity of training data and attempt to improve the performance of DP-SGD from a new perspective. Specifically, by introducing the influence function (IF), we quantitatively measure the contributions of various training data on the final machine learning model. If the contribution made by a single data instance is so little that attackers cannot infer anything from the model, we do not add noise when training with it. Based on this observation, we design a `Performance Improving' DP-SGD algorithm: PIDP-SGD. Theoretical and experimental results show that our proposed PIDP-SGD improves the performance significantly.
翻译:传统上,在对不同隐私领域不同数据案例进行培训时,随机噪音同样被注入。在本文中,我们首先给出DP 随机梯度下降法的更明显的额外风险范围。考虑到以往方法大多处于松动状态,我们使用Polyak-L}ojasiewicz条件在本文中放松它。然后,在观察不同的培训数据案例在不同程度上影响机器学习模式之后,我们考虑了培训数据的差异性,并试图从新的角度改进DP-SGD的性能。具体地说,我们通过引入影响功能(IF),量化衡量各种培训数据对最后机器学习模式的贡献。如果单一数据实例的贡献很少,攻击者无法从模型中推断出任何东西,我们不会在培训中增加噪音。基于这一观察,我们设计了一种“改进DP-SGD算法的改进效果:PIDP-SGD”。理论和实验结果显示,我们提议的PIDP-SGD的性能显著改进了业绩。