区分私人n-gram采掘 (Differentially Private n-gram Extraction) - 专知论文

会员服务 ·

0

N元 · 泛化理论 · state-of-the-art · 情景 · 剪枝 ·

2021 年 8 月 5 日

Differentially Private n-gram Extraction

翻译：区分私人n-gram采掘

Kunho Kim,Sivakanth Gopi,Janardhan Kulkarni,Sergey Yekhanin

We revisit the problem of $n$-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many $n$-grams as possible while preserving user level privacy. Extracting $n$-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012).

翻译：我们重新审视了在不同的隐私环境中以美元计价的提取问题。在这个问题中,考虑到大量的私人文本数据,目标是在维护用户隐私的同时尽可能释放以美元计价的单位。提取美元计价是许多国家专利协议应用中的一个基本的次级常规,如完成判决、电子邮件响应生成等。问题还出现在其他应用中,如序列开采等,也是最近研究的有差别的私人集合(DPSU)的概括化。在本文中,我们为该问题开发了一种新的有差别的私人算法,在我们的实验中,它大大优于最新水平。我们的改进源于将最近在DPSU、隐私核算和陈等人(2012年)倡议的植树方法中的新修剪方法结合起来的结果。

1

相关内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

最新《注意力机制》教程，112页ppt

专知会员服务

323+阅读 · 2020年11月24日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【复旦大学-SP2020】NLP语言模型隐私泄漏风险

【复旦大学-SP2020】NLP语言模型隐私泄漏风险

专知会员服务

25+阅读 · 2020年4月20日

【课程推荐】理科学校的深度学习（Deep Learning for Science School）

【课程推荐】理科学校的深度学习（Deep Learning for Science School）

专知会员服务

16+阅读 · 2019年11月10日

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

专知会员服务

14+阅读 · 2019年10月25日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Coursera上数学类相关课程（公开课）汇总推荐

Coursera上数学类相关课程（公开课）汇总推荐

AINLP

7+阅读 · 2018年10月28日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Label differential privacy via clustering

Arxiv

0+阅读 · 2021年10月5日

Unified Likelihood Ratio Estimation for High- to Zero-frequency N-grams

Arxiv

0+阅读 · 2021年10月3日

Differential Privacy of Dirichlet Posterior Sampling

Arxiv

0+阅读 · 2021年10月3日

Differentially private partition selection

Arxiv

0+阅读 · 2021年10月2日

Really Useful Synthetic Data -- A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Really Useful Synthetic Data -- A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Arxiv

0+阅读 · 2021年10月1日

Differentially Private Fractional Frequency Moments Estimation with Polylogarithmic Space

Arxiv

0+阅读 · 2021年10月1日

Scientific evidence extraction

Arxiv

0+阅读 · 2021年9月30日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

Attention Guided Graph Convolutional Networks for Relation Extraction

Arxiv

4+阅读 · 2019年10月11日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

最新《注意力机制》教程，112页ppt

专知会员服务

323+阅读 · 2020年11月24日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【复旦大学-SP2020】NLP语言模型隐私泄漏风险

【复旦大学-SP2020】NLP语言模型隐私泄漏风险

专知会员服务

25+阅读 · 2020年4月20日

【课程推荐】理科学校的深度学习（Deep Learning for Science School）

【课程推荐】理科学校的深度学习（Deep Learning for Science School）

专知会员服务

16+阅读 · 2019年11月10日

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

专知会员服务

14+阅读 · 2019年10月25日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】行动，规划与学习，622页pdf

美军坦克部队反无人机新策略：主炮轰击方案

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

数据质量维度的实践展开：一项综述

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Coursera上数学类相关课程（公开课）汇总推荐

Coursera上数学类相关课程（公开课）汇总推荐

AINLP

7+阅读 · 2018年10月28日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

相关论文

Label differential privacy via clustering

Arxiv

0+阅读 · 2021年10月5日

Unified Likelihood Ratio Estimation for High- to Zero-frequency N-grams

Arxiv

0+阅读 · 2021年10月3日

Differential Privacy of Dirichlet Posterior Sampling

Arxiv

0+阅读 · 2021年10月3日

Differentially private partition selection

Arxiv

0+阅读 · 2021年10月2日

Really Useful Synthetic Data -- A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Really Useful Synthetic Data -- A Framework to Evaluate the Quality of Differentially Private Synthetic Data

Arxiv

0+阅读 · 2021年10月1日

Differentially Private Fractional Frequency Moments Estimation with Polylogarithmic Space

Arxiv

0+阅读 · 2021年10月1日

Scientific evidence extraction

Arxiv

0+阅读 · 2021年9月30日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

Attention Guided Graph Convolutional Networks for Relation Extraction

Arxiv

4+阅读 · 2019年10月11日

Automatic Summarization of Natural Language

Arxiv

3+阅读 · 2018年12月18日

微信扫码咨询专知VIP会员