语言建模 (Provably Confidential Language Modelling) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · Perplexity · INFORMS · 生成模型 ·

2022 年 5 月 4 日

Provably Confidential Language Modelling

翻译：语言建模

Xuandong Zhao,Lei Li,Yu-Xiang Wang

from arxiv, NAACL 2022

Large language models are shown to memorize privacy information such as social security numbers in training data. Given the sheer scale of the training corpus, it is challenging to screen and filter these privacy data, either manually or automatically. In this paper, we propose Confidentially Redacted Training (CRT), a method to train language generation models while protecting the confidential segments. We borrow ideas from differential privacy (which solves a related but distinct problem) and show that our method is able to provably prevent unintended memorization by randomizing parts of the training process. Moreover, we show that redaction with an approximately correct screening policy amplifies the confidentiality guarantee. We implement the method for both LSTM and GPT language models. Our experimental results show that the models trained by CRT obtain almost the same perplexity while preserving strong confidentiality.

翻译：大型语言模型可以记住隐私信息,例如培训数据中的社会保障数字。鉴于培训资料的范围之大,对人工或自动筛选和过滤这些隐私数据具有挑战性。在本文中,我们提议采用保密的再培训方法来培训语言生成模型,同时保护机密部分。我们从差异隐私(这解决了一个相关但独特的问题)中借出一些想法,并表明我们的方法能够通过随机调整部分培训过程来防止意外的记忆化。此外,我们显示,经过修改的大致正确的筛选政策会扩大保密性保障。我们实施了LSTM和GPT语言模型的方法。我们的实验结果表明,CRT所培训的模式在保持强有力的保密性的同时,也得到了几乎相同的重复性。

0

相关内容

语言模型化

语言模型化

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

国家自然科学基金

0+阅读 · 2013年12月31日

三维成像雷达高度计海况偏差修正关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

气固非催化反应中固体产物介尺度结构的形成与生长

国家自然科学基金

0+阅读 · 2013年12月31日

部分饱和孔隙介质中地震波的衰减和频散机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

导电纤维复合物的微波吸收机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

低层错能镍基变形高温合金反常动态应变时效机理

国家自然科学基金

0+阅读 · 2011年12月31日

在盐酸溶液中纳米晶304不锈钢表面形成稳定氧化膜机理

国家自然科学基金

0+阅读 · 2011年12月31日

Fe-B共晶合金在液态锌中的腐蚀规律与机理

国家自然科学基金

0+阅读 · 2008年12月31日

松节油基驱避剂的QSAR研究

国家自然科学基金

0+阅读 · 2008年12月31日

A Fast Algorithm for Ranking Users by their Influence in Online Social Platforms

A Fast Algorithm for Ranking Users by their Influence in Online Social Platforms

Arxiv

0+阅读 · 2022年6月22日

Global sensitivity analysis based on Gaussian-process metamodelling for complex biomechanical problems

Arxiv

0+阅读 · 2022年6月21日

Decentralized Distributed Learning with Privacy-Preserving Data Synthesis

Arxiv

0+阅读 · 2022年6月20日

Time integration of finite element models with nonlinear frequency dependencies

Arxiv

0+阅读 · 2022年6月20日

Evolution through Large Models

Evolution through Large Models

Arxiv

0+阅读 · 2022年6月17日

SaDe: Learning Models that Provably Satisfy Domain Constraints

SaDe: Learning Models that Provably Satisfy Domain Constraints

Arxiv

0+阅读 · 2022年6月17日

A Hybrid Modelling Approach for Aerial Manipulators

Arxiv

0+阅读 · 2022年6月17日

On Integrating Prior Knowledge into Gaussian Processes for Prognostic Health Monitoring

Arxiv

0+阅读 · 2022年6月17日

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Arxiv

11+阅读 · 2020年5月8日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身操作的视觉-语言-动作模型综述

《多域空战指挥体系：驾驭复杂性的艺术》

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Fast Algorithm for Ranking Users by their Influence in Online Social Platforms

A Fast Algorithm for Ranking Users by their Influence in Online Social Platforms

Arxiv

0+阅读 · 2022年6月22日

Global sensitivity analysis based on Gaussian-process metamodelling for complex biomechanical problems

Arxiv

0+阅读 · 2022年6月21日

Decentralized Distributed Learning with Privacy-Preserving Data Synthesis

Arxiv

0+阅读 · 2022年6月20日

Time integration of finite element models with nonlinear frequency dependencies

Arxiv

0+阅读 · 2022年6月20日

Evolution through Large Models

Evolution through Large Models

Arxiv

0+阅读 · 2022年6月17日

SaDe: Learning Models that Provably Satisfy Domain Constraints

SaDe: Learning Models that Provably Satisfy Domain Constraints

Arxiv

0+阅读 · 2022年6月17日

A Hybrid Modelling Approach for Aerial Manipulators

Arxiv

0+阅读 · 2022年6月17日

On Integrating Prior Knowledge into Gaussian Processes for Prognostic Health Monitoring

Arxiv

0+阅读 · 2022年6月17日

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Arxiv

11+阅读 · 2020年5月8日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

相关基金

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

国家自然科学基金

0+阅读 · 2013年12月31日

三维成像雷达高度计海况偏差修正关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

气固非催化反应中固体产物介尺度结构的形成与生长

国家自然科学基金

0+阅读 · 2013年12月31日

部分饱和孔隙介质中地震波的衰减和频散机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

导电纤维复合物的微波吸收机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

低层错能镍基变形高温合金反常动态应变时效机理

国家自然科学基金

0+阅读 · 2011年12月31日

在盐酸溶液中纳米晶304不锈钢表面形成稳定氧化膜机理

国家自然科学基金

0+阅读 · 2011年12月31日

Fe-B共晶合金在液态锌中的腐蚀规律与机理

国家自然科学基金

0+阅读 · 2008年12月31日

松节油基驱避剂的QSAR研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员