We study the problem of multi-task learning under user-level differential privacy, in which $n$ users contribute data to $m$ tasks, each involving a subset of users. One important aspect of the problem, that can significantly impact quality, is the distribution skew among tasks. Certain tasks may have much fewer data samples than others, making them more susceptible to the noise added for privacy. It is natural to ask whether algorithms can adapt to this skew to improve the overall utility. We give a systematic analysis of the problem, by studying how to optimally allocate a user's privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that when there is task distribution skew, this gives a quantifiable improvement of excess empirical risk. Experimental studies on recommendation problems that exhibit a long tail of small tasks, demonstrate that our methods significantly improve utility, achieving the state of the art on two standard benchmarks.
翻译:我们研究在用户一级差异隐私下多任务学习的问题,即用户为每个涉及一组用户的一小类用户的一小类用户提供数据,每组用户为一百万美元的任务提供数据。问题的一个重要方面,即能够对质量产生重大影响的任务分配。某些任务的数据样本可能比其他任务少得多,因此更容易受到增加的隐私噪音的影响。自然,我们问算法是否能够适应这一缺陷来改进总体效用。我们通过研究如何在任务中最佳地分配用户的隐私预算来系统分析这一问题。我们建议一种通用算法,基于对经验损失的适应性重新加权,并表明在任务分配时,这可以量化地改进超额的经验风险。关于建议问题的实验性研究显示,细小任务的长期尾巴,表明我们的方法大大改进了效用,在两个标准基准上达到了最新水平。