Estimating the quantiles of a large dataset is a fundamental problem in both the streaming algorithms literature and the differential privacy literature. However, all existing private mechanisms for distribution-independent quantile computation require space at least linear in the input size $n$. In this work, we devise a differentially private algorithm for the quantile estimation problem, with strongly sublinear space complexity, in the one-shot and continual observation settings. Our basic mechanism estimates any $\alpha$-approximate quantile of a length-$n$ stream over a data universe $\mathcal{X}$ with probability $1-\beta$ using $O\left( \frac{\log (|\mathcal{X}|/\beta) \log (\alpha \epsilon n)}{\alpha \epsilon} \right)$ space while satisfying $\epsilon$-differential privacy at a single time point. Our approach builds upon deterministic streaming algorithms for non-private quantile estimation instantiating the exponential mechanism using a utility function defined on sketch items, while (privately) sampling from intervals defined by the sketch. We also present another algorithm based on histograms that is especially suited to the multiple quantiles case. We implement our algorithms and experimentally evaluate them on synthetic and real-world datasets.
翻译:估计大型数据集的量度是数据流算法文献和不同隐私文献中的一个基本问题。 但是,所有现有的分销独立的量度计算私人机制都需要至少输入大小为$n美元的空间线性。 在这项工作中, 我们设计了一种差别化的私人算法, 以在单点和连续观察环境中, 以强烈的亚线性空间复杂性 来估计四分之一的四分之一。 我们的基本机制估计了一个单一时间点上的任何美元- 约四分之一的四分之一长流。 我们的方法基于确定性流算法, 用于在数据宇宙中以$\mathcal{X ⁇ /\\beta$为单位, 概率为1\beta$, 使用$left(\fleft) / log ( ⁇ mathcal{X}\\\\\\\\\\beta)\log 输入输入输入输入输入输入输入输入输入输入的输入量值。 我们的合成序列中, 也用一个配置的合成模型来定义的另外一个数据。