The exponential increase in the amount of available data makes taking advantage of them without violating users' privacy one of the fundamental problems of computer science for the 21st century. This question has been investigated thoroughly under the framework of differential privacy. However, most of the literature has not focused on settings where the amount of data is so large that we are not able to compute the exact answer in the non-private setting (such as in the streaming setting, sublinear-time setting, etc.). This can often make the use of differential privacy unfeasible in practice. In this paper, we show a general approach for making Monte-Carlo randomized approximation algorithms differentially private. We only need to assume the error $R$ of the approximation algorithm is sufficiently concentrated around $0$ (e.g.\ $E[|R|]$ is bounded) and that the function being approximated has a small global sensitivity $\Delta$. First, we show that if the error is subexponential, then the Laplace mechanism with error magnitude proportional to the sum of $\Delta$ and the \emph{subexponential diameter} of the error of the algorithm makes the algorithm differentially private. This is true even if the worst-case global sensitivity of the algorithm is large or even infinite. We then introduce a new additive noise mechanism, which we call the zero-symmetric Pareto mechanism. We show that using this mechanism, we can make an algorithm differentially private even if we only assume a bound on the first absolute moment of the error $E[|R|]$. Finally, we use our results to give the first differentially private algorithms for various problems. This includes results for frequency moments, estimating the average degree of a graph in sublinear time, or estimating the size of the maximum matching. Our results raise many new questions; we state multiple open problems.
翻译:可用数据数量的指数增长使得这些数据在不侵犯用户隐私的情况下利用这些数据,这是21世纪计算机科学的根本问题之一。这个问题在不同的隐私框架内得到了彻底调查。然而,大多数文献没有侧重于数据数量如此之大以致我们无法在非私人环境中计算准确答案的设置(例如流流流设置、亚线性时间设置等)。这往往使得不同隐私的利用在实践中不可行。在本文中,我们展示了使蒙特-卡罗随机化近似算法具有不同私密性的一般方法。我们只需要假设近似算法的错误大约为0美元左右(例如:$[ ⁇ R] 美元是捆绑定的),而我们所估计的函数在非私人环境中的精确答案是很小的(例如流动设置、亚线性时间设置等等 等 ) 。如果错误是分解的,那么在绝对的精确度机制中,我们只能给出与 $Delta$之和eemphetrial