隐私预算日程安排 (Privacy Budget Scheduling)

Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be carefully managed to maximize the number of successfully trained models. We describe PrivateKube, an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. This distinction forces a re-design of the scheduler. We present DPF (Dominant Private Block Fairness) -- a variant of the popular Dominant Resource Fairness (DRF) algorithm -- that is geared toward the non-replenishable privacy resource but enjoys similar theoretical properties as DRF. We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. Compared to existing baselines, DPF allows training more models under the same global privacy guarantee. This is especially true for DPF over R\'enyi DP, a highly composable form of DP.

翻译：在个人数据方面受过培训的机器学习模式(ML) 已经显示会泄漏关于用户的信息。差异隐私 (DP) 能够进行模型培训, 并有保证地约束这种泄漏。在DP 下受过培训的每一个新模式都会增加数据泄漏的束缚, 并且可以被视为消耗全球隐私预算的一部分, 不应超过。这个预算是一个稀缺的资源, 必须谨慎管理, 才能最大限度地增加成功培训模式的数量。我们描述私人Kubernetes 数据中心的扩展, 它将隐私作为新类型的资源, 与其他传统的计算资源( 如CPU、 GPU 和记忆)一起管理。我们为 Kubernets定义的隐私资源镜设计了抽象的镜像, 但也存在重大差异。例如, 传统的计算资源可以被补充, 而隐私不是: 在模型完成执行后, 隐私预算可以重新恢复。这种区别迫使对调度器进行重新配置。我们展示了DPF( Dminant Prial Broad Fall) -- 一种通用资源公平(DRF) 模式的变式, 而我们则则在数据库里程数据库中, 的IFRFSER 的理论分析中, 特别地对不使用。进行高的IFIL 进行高的理论的理论分析。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

持续学习最新综述论文，29页pdf

专知会员服务

120+阅读 · 2021年4月22日

【经典书】Python金融大数据分析（Yves Hilpsch 著），566页pdf

专知会员服务

97+阅读 · 2021年1月9日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【硬核书】可扩展机器学习：并行分布式方法

专知会员服务

86+阅读 · 2020年5月23日