The streaming model of computation is a popular approach for working with large-scale data. In this setting, there is a stream of items and the goal is to compute the desired quantities (usually data statistics) while making a single pass through the stream and using as little space as possible. Motivated by the importance of data privacy, we develop differentially private streaming algorithms under the continual release setting, where the union of outputs of the algorithm at every timestamp must be differentially private. Specifically, we study the fundamental $\ell_p$ $(p\in [0,+\infty))$ frequency moment estimation problem under this setting, and give an $\varepsilon$-DP algorithm that achieves $(1+\eta)$-relative approximation $(\forall \eta\in(0,1))$ with $\mathrm{poly}\log(Tn)$ additive error and uses $\mathrm{poly}\log(Tn)\cdot \max(1, n^{1-2/p})$ space, where $T$ is the length of the stream and $n$ is the size of the universe of elements. Our space is near optimal up to poly-logarithmic factors even in the non-private setting. To obtain our results, we first reduce several primitives under the differentially private continual release model, such as counting distinct elements, heavy hitters and counting low frequency elements, to the simpler, counting/summing problems in the same setting. Based on these primitives, we develop a differentially private continual release level set estimation approach to address the $\ell_p$ frequency moment estimation problem. We also provide a simple extension of our results to the harder sliding window model, where the statistics must be maintained over the past $W$ data items.
翻译:计算流模式是一种与大型数据合作的流行方法。 在此设置中, 存在一系列项目, 目标是计算想要的数量( 通常是数据统计), 同时在流中进行单次计算, 并尽可能少使用空间。 受数据隐私重要性的驱使, 我们在连续发布设置下开发有差异的私人流算法, 每个时间戳中算法的输出组合必须是不同的私有 。 具体地说, 我们研究这个设置下的基本 $\ ell_ p$ ( p\ in [0, incinfty] 美元) 。 并且要在此设置下, 以最小值来计算所需的数量( 通常值) 。 以最小值计算值计算 $=美元, 以最小值计算我们的最小值计算值 。 在最小值排序中, 以最小值排序中, 以最小值值计算为最小值 。