In this paper, we study the problem of estimating smooth Generalized Linear Models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. In the first part of the paper we focus on GLMs. Specifically, we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein's lemma, we present an $(\epsilon, \delta)$-NLDP algorithm for GLMs. Moreover, the sample complexity of public and private data for the algorithm to achieve an $\ell_2$-norm estimation error of $\alpha$ (with high probability) is ${O}(p \alpha^{-2})$ and $\tilde{O}(p^3\alpha^{-2}\epsilon^{-2})$ respectively, where $p$ is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in $\alpha^{-1}$, or exponential in $p$ sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. Based on a variant of Stein's lemma, we propose an $(\epsilon, \delta)$-NLDP algorithm for GLMs whose sample complexity of public and private data to achieve an $\ell_\infty$-norm estimation error of $\alpha$ is ${O}(p^2\alpha^{-2})$ and $\tilde{O}(p^2\alpha^{-2}\epsilon^{-2})$ respectively, under some mild assumptions and if $\alpha$ is not too small ({\em i.e.,} $\alpha\geq \Omega(\frac{1}{\sqrt{p}})$). In the second part of the paper, we extend our idea to the problem of estimating non-linear regressions and show similar results as in GLMs for both multivariate Gaussian and sub-Gaussian cases. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real-world datasets.
翻译:在本文的第一部分, 我们研究如何估算平滑通用线性模型( GLM) 在非互动的本地差异隐私( NLDP) 模式中 。 不同于其古典设置, 我们的模型允许服务器访问一些额外的公共数据, 但是没有标签。 在本文的第一部分, 我们关注 GLM 。 具体地说, 我们首先考虑每个数据记录为 i. i. d. 从零度的多变量 Gaussia 分布的样本 。 在 Stein's Lemmma 的样本中, 我们提出了$( epl,\ delta) $- NLDP 算法。 此外, 用于算法的公共和私人数据的样本复杂性, $_ 2美元( 高概率) 是 $( p\ pha) 和 $tildea. (pl) 问题( lif3\ lip) 的样本 。 如果 = $( listal_ d) i) 和 AL_ dromoal 数据 的模型中, i- droupal 。 这是以前一个显著的数据( lax) 。