We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis class with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples. In contrast, when increasing the number of users $n$, the privacy cost decreases at a faster $O(1/n)$ rate. We complement these results with lower bounds showing the worst-case optimality of our algorithm for mean estimation and stochastic convex optimization. Our algorithms rely on novel techniques for private mean estimation in arbitrary dimension with error scaling as the concentration radius $\tau$ of the distribution rather than the entire range. Under uniform convergence, we derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $\tau$, and apply it to solve the learning tasks we consider.
翻译:我们提出并分析算法,以解决在用户一级差异隐私限制下的一系列学习任务。用户一级的DP不是仅仅保障个人样本的隐私,而是保护用户的全部贡献($m\ge 1美元样本),提供更严格但更现实的保护,防止信息泄漏。我们表明,对于高维平均估算,以光滑损失、随机连接优化和以有限公吨的学习假设等级将实验风险最小化,随着用户提供更多的样本,隐私成本将降低为O(1/\sqrt{m})美元。相比之下,当用户数量增加美元时,隐私成本将降低至更快的O(1/n)美元。我们对这些结果的补充是,以较低的界限显示我们平均估算和随机连接优化的算法最差的情况。我们的算法依赖于任意的私人平均估算新技术,错误比例是分配的集中半径$\tau,而不是整个范围。在统一的趋同下,我们得出一个算法,以私人回答以美元适应性选择的询问的顺序,以1美元的速度将我们所选择的测算为美元。