通过自动批注和联邦学习对个人信息采掘采取隐私保护办法 (A Privacy-Preserving Approach to Extraction of Personal Information through Automatic Annotation and Federated Learning) - 专知论文

会员服务 ·

0

INFORMS · 联邦学习 · 命名实体识别 · MoDELS · 学成 ·

2021 年 5 月 19 日

A Privacy-Preserving Approach to Extraction of Personal Information through Automatic Annotation and Federated Learning

翻译：通过自动批注和联邦学习对个人信息采掘采取隐私保护办法

Rajitha Hathurusinghe,Isar Nejadgholi,Miodrag Bolic

We curated WikiPII, an automatically labeled dataset composed of Wikipedia biography pages, annotated for personal information extraction. Although automatic annotation can lead to a high degree of label noise, it is an inexpensive process and can generate large volumes of annotated documents. We trained a BERT-based NER model with WikiPII and showed that with an adequately large training dataset, the model can significantly decrease the cost of manual information extraction, despite the high level of label noise. In a similar approach, organizations can leverage text mining techniques to create customized annotated datasets from their historical data without sharing the raw data for human annotation. Also, we explore collaborative training of NER models through federated learning when the annotation is noisy. Our results suggest that depending on the level of trust to the ML operator and the volume of the available data, distributed training can be an effective way of training a personal information identifier in a privacy-preserved manner. Research material is available at https://github.com/ratmcu/wikipiifed.

翻译：我们开发了由维基百科传记页组成的自动标签数据集WikiPII,这是一个由维基百科传记页组成的自动标签数据集,供个人信息提取之用。虽然自动注解可导致高程度的标签噪音,但这是一个廉价的过程,可以产生大量附加说明的文件。我们与维基百科一起培训了一个基于BERT的NER模型,并表明,有了足够大的培训数据集,该模型可以大大减少人工信息提取的成本,尽管标签噪音很大。在类似办法中,各组织可以利用文本挖掘技术,从历史数据中创建定制的附加说明的数据集,而不必为人类注解分享原始数据。此外,我们还探索通过在注解噪音时的联结学习对NER模型进行协作培训。我们的结果表明,根据对ML操作员的信任程度和现有数据的数量,分发的培训可以成为以保密方式培训个人信息标识的有效方式。研究材料可在https://github.com/ratmcu/wikipifed上查阅。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

91+阅读 · 2020年12月2日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

【UCSD-MIT】深度学习隐私综述论文，Privacy in Deep Learning: A Survey

【UCSD-MIT】深度学习隐私综述论文，Privacy in Deep Learning: A Survey

专知会员服务

68+阅读 · 2020年4月28日

【经典书】深度学习，532页pdf，Deep Learning - A Practitioner's Approach

【经典书】深度学习，532页pdf，Deep Learning - A Practitioner's Approach

专知会员服务

138+阅读 · 2020年4月3日

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

专知会员服务

36+阅读 · 2020年3月19日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

专知会员服务

15+阅读 · 2019年11月18日

最新《联邦学习Federated Learning》报告，47页ppt

最新《联邦学习Federated Learning》报告，47页ppt

专知

48+阅读 · 2020年12月2日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

【UCSD-MIT】深度学习隐私综述论文，Privacy in Deep Learning: A Survey

【UCSD-MIT】深度学习隐私综述论文，Privacy in Deep Learning: A Survey

专知

5+阅读 · 2020年4月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

专知

10+阅读 · 2018年3月9日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

UAV-assisted Online Machine Learning over Multi-Tiered Networks: A Hierarchical Nested Personalized Federated Learning Approach

Arxiv

0+阅读 · 2021年7月11日

A Federated Semi-Supervised Learning Approach for Network Traffic Classification

A Federated Semi-Supervised Learning Approach for Network Traffic Classification

Arxiv

0+阅读 · 2021年7月8日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis

One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis

Arxiv

3+阅读 · 2019年6月6日

Span Based Open Information Extraction

Arxiv

3+阅读 · 2019年3月1日

A generic framework for privacy preserving deep learning

Arxiv

6+阅读 · 2018年11月13日

Scale Up Event Extraction Learning via Automatic Training Data Generation

Arxiv

7+阅读 · 2017年12月11日

VIP会员

文章信息

相关主题

命名实体识别

相关VIP内容

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

91+阅读 · 2020年12月2日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

元学习(meta learning) 最新进展综述论文

元学习(meta learning) 最新进展综述论文

专知会员服务

281+阅读 · 2020年5月8日

【UCSD-MIT】深度学习隐私综述论文，Privacy in Deep Learning: A Survey

【UCSD-MIT】深度学习隐私综述论文，Privacy in Deep Learning: A Survey

专知会员服务

68+阅读 · 2020年4月28日

【经典书】深度学习，532页pdf，Deep Learning - A Practitioner's Approach

【经典书】深度学习，532页pdf，Deep Learning - A Practitioner's Approach

专知会员服务

138+阅读 · 2020年4月3日

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

【论文推荐】保护隐私的协同过滤综述，Survey of Privacy-Preserving Collaborative Filtering

专知会员服务

36+阅读 · 2020年3月19日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

专知会员服务

15+阅读 · 2019年11月18日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

最新《联邦学习Federated Learning》报告，47页ppt

最新《联邦学习Federated Learning》报告，47页ppt

专知

48+阅读 · 2020年12月2日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

【UCSD-MIT】深度学习隐私综述论文，Privacy in Deep Learning: A Survey

【UCSD-MIT】深度学习隐私综述论文，Privacy in Deep Learning: A Survey

专知

5+阅读 · 2020年4月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

【论文推荐】最新七篇推荐系统相关论文—协同度量学习、SQL-Rank、用户行为与神经网络、隐私价格、贝叶斯、 IoT、序列感知

专知

10+阅读 · 2018年3月9日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

相关论文

UAV-assisted Online Machine Learning over Multi-Tiered Networks: A Hierarchical Nested Personalized Federated Learning Approach

Arxiv

0+阅读 · 2021年7月11日

A Federated Semi-Supervised Learning Approach for Network Traffic Classification

A Federated Semi-Supervised Learning Approach for Network Traffic Classification

Arxiv

0+阅读 · 2021年7月8日

Model-Contrastive Federated Learning

Arxiv

10+阅读 · 2021年3月30日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis

One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis

Arxiv

3+阅读 · 2019年6月6日

Span Based Open Information Extraction

Arxiv

3+阅读 · 2019年3月1日

A generic framework for privacy preserving deep learning

Arxiv

6+阅读 · 2018年11月13日

Scale Up Event Extraction Learning via Automatic Training Data Generation

Arxiv

7+阅读 · 2017年12月11日

微信扫码咨询专知VIP会员