Drichlett Dirichlet 不同隐私分配示范培训 (Latent Dirichlet Allocation Model Training with Differential Privacy) - 专知论文

会员服务 ·

0

LDA · 潜在狄利克雷分配 · 收缩的吉布斯抽样 · Extensibility · MoDELS ·

2020 年 10 月 9 日

Latent Dirichlet Allocation Model Training with Differential Privacy

翻译：Drichlett Dirichlet 不同隐私分配示范培训

Fangyuan Zhao,Xuebin Ren,Shusen Yang,Qing Han,Peng Zhao,Xinyu Yang

Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for hidden semantic discovery of text data and serves as a fundamental tool for text analysis in various applications. However, the LDA model as well as the training process of LDA may expose the text information in the training data, thus bringing significant privacy concerns. To address the privacy issue in LDA, we systematically investigate the privacy protection of the main-stream LDA training algorithm based on Collapsed Gibbs Sampling (CGS) and propose several differentially private LDA algorithms for typical training scenarios. In particular, we present the first theoretical analysis on the inherent differential privacy guarantee of CGS based LDA training and further propose a centralized privacy-preserving algorithm (HDP-LDA) that can prevent data inference from the intermediate statistics in the CGS training. Also, we propose a locally private LDA training algorithm (LP-LDA) on crowdsourced data to provide local differential privacy for individual data contributors. Furthermore, we extend LP-LDA to an online version as OLP-LDA to achieve LDA training on locally private mini-batches in a streaming setting. Extensive analysis and experiment results validate both the effectiveness and efficiency of our proposed privacy-preserving LDA training algorithms.

翻译：远程Drichlet分配(LDA)是隐藏文字数据语义发现的一种流行主题模型技术,是各种应用中文本分析的基本工具,但LDA模式以及LDA培训过程可能暴露培训数据中的文本信息,从而带来重大的隐私问题。为了解决LDA的隐私问题,我们系统地调查基于Clobled Gibbs抽样(CGS)的LDA主要培训算法的隐私保护问题,并为典型的培训情景提出若干不同的私人LDA算法。我们特别就基于LDA培训的CGS固有的差异隐私保障提出了第一次理论分析,并进一步提出了中央隐私保护算法(HDP-LDA),该算法可以防止CGS培训的中间统计数据产生误差。此外,我们提议采用当地私人LDA培训算法(LP-LDA),为个人数据提供者提供本地差异隐私权。此外,我们将LP-LDA扩大到LDA的在线版本,作为OLP-LDA的LDA,以便实现LDA在LDA系统测试和LDA成果的当地私人保密效率的测试。我们提议的LDA。

0

相关内容

LDA

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

专知会员服务

36+阅读 · 2020年3月19日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

专知会员服务

20+阅读 · 2020年1月20日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy

Arxiv

1+阅读 · 2020年12月1日

Boosting the Performance of Semi-Supervised Learning with Unsupervised Clustering

Arxiv

0+阅读 · 2020年12月1日

Sharp phase transitions for exact support recovery under local differential privacy

Arxiv

0+阅读 · 2020年11月30日

Robust and Private Learning of Halfspaces

Arxiv

0+阅读 · 2020年11月30日

Sequential Fair Allocation of Limited Resources under Stochastic Demands

Arxiv

0+阅读 · 2020年11月29日

Local Information Privacy and Its Application to Privacy-Preserving Data Aggregation

Arxiv

0+阅读 · 2020年11月29日

Privacy-Preserving Federated Learning for UAV-Enabled Networks: Learning-Based Joint Scheduling and Resource Management

Arxiv

0+阅读 · 2020年11月28日

Training Generative Adversarial Networks by Solving Ordinary Differential Equations

Arxiv

0+阅读 · 2020年11月28日

Differential privacy with partial knowledge

Arxiv

0+阅读 · 2020年11月27日

Score-Based Generative Modeling through Stochastic Differential Equations

Arxiv

0+阅读 · 2020年11月26日

VIP会员

文章信息

相关主题

潜在狄利克雷分配

收缩的吉布斯抽样

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

专知会员服务

36+阅读 · 2020年3月19日

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

生成式对抗网络先验贝叶斯推断，Bayesian Inference with Generative Adversarial Network Priors

专知会员服务

28+阅读 · 2020年2月18日

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

专知会员服务

20+阅读 · 2020年1月20日

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

【AAAI2020论文】隐私保留GBDT（Privacy-Preserving Gradient Boosting Decision Trees）

专知会员服务

36+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy

Arxiv

1+阅读 · 2020年12月1日

Boosting the Performance of Semi-Supervised Learning with Unsupervised Clustering

Arxiv

0+阅读 · 2020年12月1日

Sharp phase transitions for exact support recovery under local differential privacy

Arxiv

0+阅读 · 2020年11月30日

Robust and Private Learning of Halfspaces

Arxiv

0+阅读 · 2020年11月30日

Sequential Fair Allocation of Limited Resources under Stochastic Demands

Arxiv

0+阅读 · 2020年11月29日

Local Information Privacy and Its Application to Privacy-Preserving Data Aggregation

Arxiv

0+阅读 · 2020年11月29日

Privacy-Preserving Federated Learning for UAV-Enabled Networks: Learning-Based Joint Scheduling and Resource Management

Arxiv

0+阅读 · 2020年11月28日

Training Generative Adversarial Networks by Solving Ordinary Differential Equations

Arxiv

0+阅读 · 2020年11月28日

Differential privacy with partial knowledge

Arxiv

0+阅读 · 2020年11月27日

Score-Based Generative Modeling through Stochastic Differential Equations

Arxiv

0+阅读 · 2020年11月26日

微信扫码咨询专知VIP会员