Quantiles are often used for summarizing and understanding data. If that data is sensitive, it may be necessary to compute quantiles in a way that is differentially private, providing theoretical guarantees that the result does not reveal private information. However, when multiple quantiles are needed, existing differentially private algorithms fare poorly: they either compute quantiles individually, splitting the privacy budget, or summarize the entire distribution, wasting effort. In either case the result is reduced accuracy. In this work we propose an instance of the exponential mechanism that simultaneously estimates exactly $m$ quantiles from $n$ data points while guaranteeing differential privacy. The utility function is carefully structured to allow for an efficient implementation that returns estimates of all $m$ quantiles in time $O(mn\log(n) + m^2n)$. Experiments show that our method significantly outperforms the current state of the art on both real and synthetic data while remaining efficient enough to be practical.
翻译:量化法通常用于总结和理解数据。 如果数据是敏感的, 则可能有必要以不同私人的方式计算量化法, 提供理论保证结果不会披露私人信息。 但是, 当需要多个量化法时, 现有的差异私人算法效果很差: 它们要么单独计算量化法, 分割隐私预算, 要么总结整个分布, 浪费努力。 在这两种情况下, 结果都会降低准确性 。 在这项工作中, 我们提议了一个指数机制的例子, 在保证差异隐私的同时, 从 $n 数据点同时精确估算 $ m m Qunticles 。 实用功能结构谨慎, 以允许高效地执行, 以时间 $O( mn\log) + m ⁇ 2n 来返回所有 $ $ 美元 的量化法估计数 。 实验显示, 我们的方法在实际和合成数据上都大大超越了当前的艺术状态, 同时保持足够实用性 。