We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution. We propose and analyze a one-pass mini-batch stochastic gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement. Noise is added for DP but the noise standard deviation is estimated online. Compared to existing $(\epsilon, \delta)$-DP techniques which have sub-optimal error bounds, DP-AMBSSGD is able to provide nearly optimal error bounds in terms of key parameters like dimensionality $d$, number of points $N$, and the standard deviation $\sigma$ of the noise in observations. For example, when the $d$-dimensional covariates are sampled i.i.d. from the normal distribution, then the excess error of DP-AMBSSGD due to privacy is $\frac{\sigma^2 d}{N}(1+\frac{d}{\epsilon^2 N})$, i.e., the error is meaningful when number of samples $N= \Omega(d \log d)$ which is the standard operative regime for linear regression. In contrast, error bounds for existing efficient methods in this setting are: $\mathcal{O}\big(\frac{d^3}{\epsilon^2 N^2}\big)$, even for $\sigma=0$. That is, for constant $\epsilon$, the existing techniques require $N=\Omega(d\sqrt{d})$ to provide a non-trivial result.
翻译:我们研究不同私人线性回归的问题, 每个数据点都是从固定的 sub- Gussian 风格分布中取样的。 我们提议并分析一个一次性的迷你杯点梯度下降法( DP- AMBSSGD ), 每次迭代中的点抽样而不替换。 给 DP 添加噪音, 但噪音标准偏差是在线估算的。 与现有的有亚最佳误差限制的 $ (\ epsil,\ delta) $- DP 技术相比, DP- AMBSSGD 能够提供近乎最佳的错误界限, 关键参数包括 维度 $( $, 点数 $( AMBSGD ) 。 当美元( = $% 2 d ) 时, 运行系统的现有折叠成的折叠成值( = $( 美元) 美元) 。 运行折叠的折叠值方法是 。