There has been much recent work in the shuffle model of differential privacy, particularly for approximate $d$-bin histograms. While these protocols achieve low error, the number of messages sent by each user -- the message complexity -- has so far scaled with $d$ or the privacy parameters. The message complexity is an informative predictor of a shuffle protocol's resource consumption. We present a protocol whose message complexity is two when there are sufficiently many users. The protocol essentially pairs each row in the dataset with a fake row and performs a simple randomization on all rows. We show that the error introduced by the protocol is small, using rigorous analysis as well as experiments on real-world data. We also prove that corrupt users have a relatively low impact on our protocol's estimates.
翻译:不同隐私的洗牌模型最近做了很多工作, 特别是约美元- 宾方直方图。 虽然这些协议有低误, 但每个用户发送的信息数量 -- -- 信息复杂度 -- -- 迄今已经用美元或隐私参数缩放。 信息复杂度是洗牌协议资源消耗的信息预测器。 当用户数量足够多时, 我们提出了一个协议, 其信息复杂度是两个。 协议基本上将数据集中的每行配对成一个假行, 并对所有行进行简单的随机化。 我们显示协议引入的错误很小, 使用严格的分析和对真实世界数据的实验。 我们还证明腐败用户对我们协议估算的影响相对较低 。