Given an empirical distribution $f(x)$ of sensitive data $x$, we consider the task of minimizing $F(y) = D_{\text{KL}} (f(x)\Vert y)$ over a probability simplex, while protecting the privacy of $x$. We observe that, if we take the exponential mechanism and use the KL divergence as the loss function, then the resulting algorithm is the Dirichlet mechanism that outputs a single draw from a Dirichlet distribution. Motivated by this, we propose a R\'enyi differentially private (RDP) algorithm that employs the Dirichlet mechanism to solve the KL divergence minimization task. In addition, given $f(x)$ as above and $\hat{y}$ an output of the Dirichlet mechanism, we prove a probability tail bound on $D_{\text{KL}} (f(x)\Vert \hat{y})$, which is then used to derive a lower bound for the sample complexity of our RDP algorithm. Experiments on real-world datasets demonstrate advantages of our algorithm over Gaussian and Laplace mechanisms in supervised classification and maximum likelihood estimation.
翻译:根据敏感数据的实证分配 $f(x) 美元 敏感数据 $x美元,我们考虑在保护美元隐私的同时,在保护x美元的隐私的同时,以概率简单x来将美元(F)(y) = D ⁇ text{KL}(f)(x)\Vert y) 最小化。我们观察,如果我们采用指数机制,并以KL差数作为损失函数,那么由此产生的算法是Drichlet 机制,从Drichlet 分布中产生单一的输出。为此,我们提议采用R\ enyi 差别化的私人算法(RDP),使用Drichlet 机制解决 KL差数最小化任务。此外,考虑到以上(x) $f(x) 美元和 drichlet 机制输出的美元,我们证明我们有可能以${text{KL* (f(x)\Vert\hat{y} $(f)(f)\ Vert\ hat{y} ) 来输出一个单件。我们RDP 算算法的样本复杂性较低。