高效率地包装隐私预算 (Packing Privacy Budget Efficiently)

Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPK, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPK: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPK, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.

翻译：机器学习( ML) 模式可以泄露用户信息, 且不同隐私(DP) 提供了严格的方式来约束特定预算下的漏洞。这个 DP预算可以被视为一种新型的计算资源, 计算多个 ML 模式对用户数据的培训工作量。一旦使用, DP预算将永远消耗。因此, 最高效地分配它来培训尽可能多的模型。本文展示了最优化地提高效率的隐私计划表。我们把隐私安排作为一种新的多层面的多功能背包问题, 称为隐私背包, 最大限度地提高 DP 预算效率。我们显示, 隐私 knapsack 是硬的, 因此实际算法必然是近似性的。我们为隐私 knapsack、 DPK 开发了近似算算法, 并评估了它是如何尽可能多地培训更多的模型。因此, DPK:(1) 通常采用同样的效率- 最佳计划表, (2) 持续地安排更多的任务, 与最先进的隐私安排模式相比, 以公平性为主, DP- 2 (1.3-1x ) 的用户应该提供更公平的 DP- 2 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日