We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis classes with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples. In contrast, when increasing the number of users $n$, the privacy cost decreases at a faster $O(1/n)$ rate. We complement these results with lower bounds showing the minimax optimality of our algorithms for mean estimation and stochastic convex optimization. Our algorithms rely on novel techniques for private mean estimation in arbitrary dimension with error scaling as the concentration radius $\tau$ of the distribution rather than the entire range.
翻译:我们提出和分析算法,以解决用户一级差异隐私限制下的一系列学习任务。用户一级的DP不是仅仅保障单个样本的隐私,而是保护用户的全部贡献(mm\ge 1美元样本),提供更严格但更现实的保护,防止信息泄漏。我们显示,对于高维平均估算,用光滑损失、随机共振优化和学习假设等级来尽量减少实验风险,使用有限公吨的英特罗普,隐私成本降低为O(1/\sqrt{m})美元,因为用户提供了更多的样本。相比之下,当用户数量增加美元时,隐私成本降低的速度以更快的O(1/n)美元的速度。我们用较低的界限来补充这些结果,显示我们中值估算和随机共振动优化的算法的最小性最佳性。我们的算法依靠任意的私人中值估算新技术,错误比例是分配的浓度半值$\tau,而不是整个范围。