This paper presents a new protocol for solving the private heavy-hitters problem. In this problem, there are many clients and a small set of data-collection servers. Each client holds a private bitstring. The servers want to recover the set of all popular strings, without learning anything else about any client's string. A web-browser vendor, for instance, can use our protocol to figure out which homepages are popular, without learning any user's homepage. We also consider the simpler private subset-histogram problem, in which the servers want to count how many clients hold strings in a particular set without revealing this set to the clients. Our protocols use two data-collection servers and, in a protocol run, each client send sends only a single message to the servers. Our protocols protect client privacy against arbitrary misbehavior by one of the servers and our approach requires no public-key cryptography (except for secure channels), nor general-purpose multiparty computation. Instead, we rely on incremental distributed point functions, a new cryptographic tool that allows a client to succinctly secret-share the labels on the nodes of an exponentially large binary tree, provided that the tree has a single non-zero path. Along the way, we develop new general tools for providing malicious security in applications of distributed point functions. In an experimental evaluation with two servers on opposite sides of the U.S., the servers can find the 200 most popular strings among a set of 400,000 client-held 256-bit strings in 54 minutes. Our protocols are highly parallelizable. We estimate that with 20 physical machines per logical server, our protocols could compute heavy hitters over ten million clients in just over one hour of computation.
翻译:本文展示了解决私人重文件人问题的新协议。 在此问题上, 有许多客户和少量的数据收集服务器。 每个客户都持有私有的位字串。 服务器希望回收所有流行字符串, 而不了解任何客户的字符串。 例如, 网络浏览器的供应商可以使用我们的协议来找出哪些主页是受欢迎的, 而不学习任何用户的主页。 我们还考虑简单的私书集问题, 即服务器想要在不向客户披露此设置的情况下计算有多少客户持有特定的机组。 我们的协议使用两个物理数据收集服务器, 在协议运行中, 每个客户只发送一个单一的信息。 我们的协议保护客户隐私, 防止某个服务器任意的错误行为, 我们的方法不需要使用任何公用密码加密( 除安全频道之外 ), 也不需要通用的多功能计算。 相反, 我们依靠递增的分布点功能, 一个新的加密工具, 允许客户在不向客户披露此设置的设置的数据集中, 提供我们最高级的服务器的双向端的服务器的 。 。 我们的直径直径的直径的直径直径直径直径直径直的服务器, 提供一个不直径直径直径直径直的服务器。 。