Privacy and communication constraints are two major bottlenecks in federated learning (FL) and analytics (FA). We study the optimal accuracy of mean and frequency estimation (canonical models for FL and FA respectively) under joint communication and $(\varepsilon, \delta)$-differential privacy (DP) constraints. We show that in order to achieve the optimal error under $(\varepsilon, \delta)$-DP, it is sufficient for each client to send $\Theta\left( n \min\left(\varepsilon, \varepsilon^2\right)\right)$ bits for FL and $\Theta\left(\log\left( n\min\left(\varepsilon, \varepsilon^2\right) \right)\right)$ bits for FA to the server, where $n$ is the number of participating clients. Without compression, each client needs $O(d)$ bits and $\log d$ bits for the mean and frequency estimation problems respectively (where $d$ corresponds to the number of trainable parameters in FL or the domain size in FA), which means that we can get significant savings in the regime $ n \min\left(\varepsilon, \varepsilon^2\right) = o(d)$, which is often the relevant regime in practice. Our algorithms leverage compression for privacy amplification: when each client communicates only partial information about its sample, we show that privacy can be amplified by randomly selecting the part contributed by each client.
翻译:隐私和通信限制是联合学习和分析中的两个主要瓶颈。我们研究了在联合通信和 (ε,δ)-差分隐私 限制下均值和频率估计的最优准确性。它们是联合学习和分析的模板。我们展示了,在达到 (ε,δ)-差分隐私下的最优误差时,每个客户端只需要向服务器发送 $\Theta(n \min(\varepsilon, \varepsilon^2))$ 位用于联合学习和 $\Theta(\log(n\min(\varepsilon, \varepsilon^2)))$ 位用于分析,其中 $n$ 是参与客户端的数量。如果没有压缩,每个客户端需要 $O(d)$ 位和 $\log d$ 位,用于均值和频率估计问题,其中 $d$ 对应于联合学习中的可训练参数数目或分析中的域大小,这意味着在实践中通常存在的 $n\min(\varepsilon, \varepsilon^2) = o(d)$ 的方案中我们可以获得显著的节省。我们的算法利用压缩来进行隐私放大:当每个客户端仅交换其样本的部分信息时,我们展示了通过随机选择每个客户端贡献的部分来放大隐私。