巨量流数据的差分隐私处理 (Differentially Private Stream Processing at Scale) - 专知论文

会员服务 ·

0

差分 · 差分隐私 · 流数据 · 设计 · 流处理系统 ·

2023 年 3 月 31 日

Differentially Private Stream Processing at Scale

翻译：巨量流数据的差分隐私处理

Bing Zhang,Vadym Doroshenko,Peter Kairouz,Thomas Steinke,Abhradeep Thakurta,Ziyin Ma,Himani Apte,Jodi Spacek

We design, to the best of our knowledge, the first differentially private (DP) stream processing system at scale. Our system --Differential Privacy SQL Pipelines (DP-SQLP)-- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider.

翻译：我们设计了目前为止第一个具有规模的差分隐私（DP）流处理系统。我们的系统——差分隐私SQL管道（DP-SQLP）——采用与Spark流式处理类似的流式框架构建，并建立在谷歌的Spanner数据库和F1查询引擎之上。为了设计DP-SQLP，我们进行了算法和系统方面的创新，即，我们（i）设计了一种新颖的DP关键字选择算法，可以操作一个无界的可能关键字集合，并且可以扩展到用户已经贡献了十亿个关键字，（ii）设计了一种DP关键字选择的预先执行方案，避免在每个触发时间枚举所有关键字，（iii）使用DP连续观察的算法技术，在流长度上发布用户贡献到不同关键字的连续DP直方图。我们通过至少考虑有意义的基线获得了至少$16\times$的误差降低。

0

相关内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

65+阅读 · 2023年2月15日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

专知会员服务

11+阅读 · 2022年10月20日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR 2022】基于本地正则化和稀疏化差分隐私的联邦学习，Differentially Private Federated Learning with Local Regularization and Sparsification

【CVPR 2022】基于本地正则化和稀疏化差分隐私的联邦学习，Differentially Private Federated Learning with Local Regularization and Sparsification

专知会员服务

17+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知

5+阅读 · 2022年11月13日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

集群环境下内存空间数据库管理与查询技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向隐私保护的云数据访问模型与方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向高性能云平台的并行程序优化关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

混合云中的数据密集型工作流调度策略研究

国家自然科学基金

1+阅读 · 2013年12月31日

沥青混合料力链时空演化的细观分析

国家自然科学基金

1+阅读 · 2013年12月31日

数据质量管理中的完整性约束关键技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

差分隐私保护关键技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

容忍泄漏公钥加密的设计及安全性证明

国家自然科学基金

0+阅读 · 2012年12月31日

泛数据双向包容连接查询处理关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

体全息光栅的制作及其改善半导体激光器光谱特性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

On the Fairness Impacts of Private Ensembles Models

Arxiv

0+阅读 · 2023年5月19日

Differentially Private Online Item Pricing

Arxiv

0+阅读 · 2023年5月19日

Algorithmically Effective Differentially Private Synthetic Data

Arxiv

0+阅读 · 2023年5月18日

Your diffusion model secretly knows the dimension of the data manifold

Arxiv

0+阅读 · 2023年5月18日

Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries

Arxiv

0+阅读 · 2023年5月18日

The Web Can Be Your Oyster for Improving Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Understanding how Differentially Private Generative Models Spend their Privacy Budget

Arxiv

0+阅读 · 2023年5月18日

Explaining epsilon in local differential privacy through the lens of quantitative information flow

Arxiv

0+阅读 · 2023年5月18日

FedComm: Federated Learning as a Medium for Covert Communication

Arxiv

0+阅读 · 2023年5月17日

Exploring the Space of Key-Value-Query Models with Intention

Arxiv

0+阅读 · 2023年5月17日

VIP会员

文章信息

相关主题

流处理系统

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

65+阅读 · 2023年2月15日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

专知会员服务

11+阅读 · 2022年10月20日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR 2022】基于本地正则化和稀疏化差分隐私的联邦学习，Differentially Private Federated Learning with Local Regularization and Sparsification

【CVPR 2022】基于本地正则化和稀疏化差分隐私的联邦学习，Differentially Private Federated Learning with Local Regularization and Sparsification

专知会员服务

17+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

乌克兰太空研究（2022-2024年） | 176页

【CMU博士论文】大型语言模型的隐性特性

国防领域人工智能走向何方？

相关资讯

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知

5+阅读 · 2022年11月13日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

相关论文

On the Fairness Impacts of Private Ensembles Models

Arxiv

0+阅读 · 2023年5月19日

Differentially Private Online Item Pricing

Arxiv

0+阅读 · 2023年5月19日

Algorithmically Effective Differentially Private Synthetic Data

Arxiv

0+阅读 · 2023年5月18日

Your diffusion model secretly knows the dimension of the data manifold

Arxiv

0+阅读 · 2023年5月18日

Trade-offs in Static and Dynamic Evaluation of Hierarchical Queries

Arxiv

0+阅读 · 2023年5月18日

The Web Can Be Your Oyster for Improving Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Understanding how Differentially Private Generative Models Spend their Privacy Budget

Arxiv

0+阅读 · 2023年5月18日

Explaining epsilon in local differential privacy through the lens of quantitative information flow

Arxiv

0+阅读 · 2023年5月18日

FedComm: Federated Learning as a Medium for Covert Communication

Arxiv

0+阅读 · 2023年5月17日

Exploring the Space of Key-Value-Query Models with Intention

Arxiv

0+阅读 · 2023年5月17日

相关基金

集群环境下内存空间数据库管理与查询技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向隐私保护的云数据访问模型与方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向高性能云平台的并行程序优化关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

混合云中的数据密集型工作流调度策略研究

国家自然科学基金

1+阅读 · 2013年12月31日

沥青混合料力链时空演化的细观分析

国家自然科学基金

1+阅读 · 2013年12月31日

数据质量管理中的完整性约束关键技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

差分隐私保护关键技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

容忍泄漏公钥加密的设计及安全性证明

国家自然科学基金

0+阅读 · 2012年12月31日

泛数据双向包容连接查询处理关键技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

体全息光栅的制作及其改善半导体激光器光谱特性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员