In this work, we study high-dimensional mean estimation under user-level differential privacy, and attempt to design an $(\epsilon,\delta)$-differentially private mechanism using as few users as possible. In particular, we provide a nearly optimal trade-off between the number of users and the number of samples per user required for private mean estimation, even when the number of users is as low as $O(\frac{1}{\epsilon}\log\frac{1}{\delta})$. Interestingly our bound $O(\frac{1}{\epsilon}\log\frac{1}{\delta})$ on the number of users is independent of the dimension, unlike the previous work that depends polynomially on the dimension, solving a problem left open by Amin et al.~(ICML'2019). Our mechanism enjoys robustness up to the point that even if the information of $49\%$ of the users are corrupted, our final estimation is still approximately accurate. Finally, our results also apply to a broader range of problems such as learning discrete distributions, stochastic convex optimization, empirical risk minimization, and a variant of stochastic gradient descent via a reduction to differentially private mean estimation.
翻译:在这项工作中,我们研究用户差异隐私下的高维平均值估算值,并试图使用尽可能多的用户来设计一个以美元(epsilon,\delta)为单位的专用机制。特别是,我们在用户数量和每个用户的样本数量之间提供一种几乎最佳的权衡,供个人平均估算使用,即使用户数量低于美元(o)(frac{1-unsilonçlog\frac{1undelta})。有趣的是,我们对用户数量的约束值$(o)(frac{1-unsilón ⁇ log\frac{1-undelta})与用户数量无关,与以前依赖多个层面的工作不同,我们解决了Amin et al. ~(ICML'2019) 留下的一个问题。我们的机制稳健可靠,以至于即使用户49 美元的信息被损坏,我们的最后估计值仍然大致准确。最后,我们的结果也适用于更广泛的问题,例如学习离散分布、通过软性变位、通过软性变位率降低私人风险。