使用保护隐私文本改写的敏感数据众包</s> (Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting) - 专知论文

会员服务 ·

0

INFORMS · 标注 · 可辨认的 · 数据标签 · NLP ·

2023 年 3 月 6 日

Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting

翻译：使用保护隐私文本改写的敏感数据众包

Nina Mouhammad,Johannes Daxenberger,Benjamin Schiller,Ivan Habernal

Most tasks in NLP require labeled data. Data labeling is often done on crowdsourcing platforms due to scalability reasons. However, publishing data on public platforms can only be done if no privacy-relevant information is included. Textual data often contains sensitive information like person names or locations. In this work, we investigate how removing personally identifiable information (PII) as well as applying differential privacy (DP) rewriting can enable text with privacy-relevant information to be used for crowdsourcing. We find that DP-rewriting before crowdsourcing can preserve privacy while still leading to good label quality for certain tasks and data. PII-removal led to good label quality in all examined tasks, however, there are no privacy guarantees given.

翻译：NLP的大多数任务都需要贴上标签的数据标签。由于可缩放性的原因,数据标签往往在众包平台上进行。然而,只有在没有包含与隐私有关的信息的情况下,公共平台上公布数据才能做到。文本数据通常包含敏感信息,如个人姓名或地点。在这项工作中,我们调查如何删除个人可识别信息(PII)以及应用差异隐私重写,使含有与隐私有关信息的文本能够用于众包。我们发现,在众包之前的DP重新撰写可以维护隐私,同时仍然能为某些任务和数据带来良好的标签质量。 PII去除导致所有被审查的任务都具有良好的标签质量,然而,没有隐私保障。</s>

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

高κ栅介质/III-V族半导体界面元素扩散的表征及钝化研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

流体力学中两类非线性偏微分方程的定性研究

国家自然科学基金

0+阅读 · 2009年12月31日

Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis

Arxiv

0+阅读 · 2023年4月25日

CC-FedAvg: Computationally Customized Federated Averaging

Arxiv

0+阅读 · 2023年4月22日

Reinforcement Learning Approaches for Traffic Signal Control under Missing Data

Arxiv

0+阅读 · 2023年4月21日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

VIP会员

文章信息

相关主题

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

美海军作战管理系统：变革战场空间的二十年

《任务与武器驱动美海军舰队设计》报告

俄罗斯“沙希德”/“天竺葵”攻击无人机

《利用动态图对网络攻击进行建模与仿真：在云安全评估中的应用》90页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis

Arxiv

0+阅读 · 2023年4月25日

CC-FedAvg: Computationally Customized Federated Averaging

Arxiv

0+阅读 · 2023年4月22日

Reinforcement Learning Approaches for Traffic Signal Control under Missing Data

Arxiv

0+阅读 · 2023年4月21日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

相关基金

高κ栅介质/III-V族半导体界面元素扩散的表征及钝化研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

流体力学中两类非线性偏微分方程的定性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员