Differentially private (DP) release of multidimensional statistics typically considers an aggregate sensitivity, e.g. the vector norm of a high-dimensional vector. However, different dimensions of that vector might have widely different magnitudes and therefore DP perturbation disproportionately affects the signal across dimensions. We observe this problem in the gradient release of the DP-SGD algorithm when using it for variational inference (VI), where it manifests in poor convergence as well as high variance in outputs for certain variational parameters, and make the following contributions: (i) We mathematically isolate the cause for the difference in magnitudes between gradient parts corresponding to different variational parameters. Using this as prior knowledge we establish a link between the gradients of the variational parameters, and propose an efficient while simple fix for the problem to obtain a less noisy gradient estimator, which we call $\textit{aligned}$ gradients. This approach allows us to obtain the updates for the covariance parameter of a Gaussian posterior approximation without a privacy cost. We compare this to alternative approaches for scaling the gradients using analytically derived preconditioning, e.g. natural gradients. (ii) We suggest using iterate averaging over the DP parameter traces recovered during the training, to reduce the DP-induced noise in parameter estimates at no additional cost in privacy. Finally, (iii) to accurately capture the additional uncertainty DP introduces to the model parameters, we infer the DP-induced noise from the parameter traces and include that in the learned posteriors to make them $\textit{noise aware}$. We demonstrate the efficacy of our proposed improvements through various experiments on real data.
翻译:不同私人(DP)发布多维统计数据通常会考虑一个总体敏感性,例如{高维矢量的矢量规范。然而,该矢量的不同维度可能具有大相径庭的大小,因此对不同维度的影响不相称。我们在使用 DP-SGD 算法进行变异推断(VI) 时观察到这个问题,因为使用 DP-SGD 算法的梯度释放显示某些变异参数的趋同性差和产出差异很大,并做出以下贡献:(一) 我们数学地分离了与不同变异参数相对的梯度部分之间的大小差异的原因。作为先前的知识,我们在变异参数梯度的梯度之间建立起了联系,因此我们建议对问题进行一种高效的简单修正,以获得一个不那么吵的梯度估计(我们称之为 $\ textitilitutitle{ar} 梯度。这个方法让我们获得高尔西的后端缩略度缩略图参数的最新参数,而没有隐私成本。我们比较了这个方法,用分析性推算的梯度来测量梯度的梯度的参数,例如:在 度的精确度的精确度估算中,在 度中,我们用正轨测测测值显示中,我们用 度中,我们用 将它用的是,在 度的精确度显示的精确度显示的推算法的推算法的推算法的推算法,在慢度,在慢度,在了。