通过自动培训数据生成扩大采掘学习 (Scale Up Event Extraction Learning via Automatic Training Data Generation) - 专知论文

会员服务 ·

0

事件抽取 · 训练数据 · 知识提取 · 缩放 · Extensibility ·

2017 年 12 月 11 日

Scale Up Event Extraction Learning via Automatic Training Data Generation

翻译：通过自动培训数据生成扩大采掘学习

Ying Zeng,Yansong Feng,Rong Ma,Zheng Wang,Rui Yan,Chongde Shi,Dongyan Zhao

from arxiv, 8 pages, accepted by AAAI 2018

The task of event extraction has long been investigated in a supervised learning paradigm, which is bound by the number and the quality of the training instances. Existing training data must be manually generated through a combination of expert domain knowledge and extensive human involvement. However, due to drastic efforts required in annotating text, the resultant datasets are usually small, which severally affects the quality of the learned model, making it hard to generalize. Our work develops an automatic approach for generating training data for event extraction. Our approach allows us to scale up event extraction training instances from thousands to hundreds of thousands, and it does this at a much lower cost than a manual approach. We achieve this by employing distant supervision to automatically create event annotations from unlabelled text using existing structured knowledge bases or tables.We then develop a neural network model with post inference to transfer the knowledge extracted from structured knowledge bases to automatically annotate typed events with corresponding arguments in text.We evaluate our approach by using the knowledge extracted from Freebase to label texts from Wikipedia articles. Experimental results show that our approach can generate a large number of high quality training instances. We show that this large volume of training data not only leads to a better event extractor, but also allows us to detect multiple typed events.

翻译：长期以来,事件提取任务一直是在受监督的学习模式中调查的,这种模式受培训案例的数量和质量的约束。现有的培训数据必须通过专家领域知识的结合和广泛的人类参与来手工生成。然而,由于在说明文本方面需要付出巨大的努力,由此产生的数据集通常很小,对所学模型的质量造成不同程度的影响,难以概括。我们的工作为生成事件提取培训数据开发了一种自动方法。我们的方法使我们能够将事件提取培训案例从数千个增加到数十万个,而且这样做的成本比人工方法低得多。我们通过远程监督,利用现有结构化知识基础或表格自动从未贴标签的文本中生成事件说明。我们随后开发了一个神经网络模型,从结构化的知识库中提取的知识被自动转换为带有相应文本参数的预告事件。我们通过使用从Freebase提取的知识来评估我们的方法,将事件提取到贴上维基百科文章的标签。实验结果显示,我们的方法可以产生大量高质量的培训案例。我们显示,大量培训类型的数据能够更好地提取事件。

7

相关内容

事件抽取

事件抽取指的是从非结构化文本中抽取事件信息，并将其以结构化形式呈现出来的任务。例如从“毛泽东1893 年出生于湖南湘潭”这句话中抽取事件{类型：出生，人物：毛泽东，时间：1893 年，出生地：湖南湘潭}。事件抽取任务通常包含事件类型识别和事件元素填充两个子任务。

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【清华大学-腾讯】关系提取综述，Review and Outlook for Relation Extraction

【清华大学-腾讯】关系提取综述，Review and Outlook for Relation Extraction

专知会员服务

38+阅读 · 2020年4月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

专知会员服务

14+阅读 · 2019年10月25日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

AINLP

25+阅读 · 2019年8月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

论文浅尝 | Zero-Shot Transfer Learning for Event Extraction

论文浅尝 | Zero-Shot Transfer Learning for Event Extraction

开放知识图谱

26+阅读 · 2018年11月1日

春节充电系列：李宏毅2017机器学习课程学习笔记19之迁移学习（Transfer Learning）

春节充电系列：李宏毅2017机器学习课程学习笔记19之迁移学习（Transfer Learning）

专知

9+阅读 · 2018年3月5日

论文浅尝 | Reinforcement Learning for Relation Classification

论文浅尝 | Reinforcement Learning for Relation Classification

开放知识图谱

9+阅读 · 2017年12月10日

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Arxiv

5+阅读 · 2019年11月1日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Open Domain Event Extraction Using Neural Latent Variable Models

Open Domain Event Extraction Using Neural Latent Variable Models

Arxiv

4+阅读 · 2019年6月17日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

Rapid Customization for Event Extraction

Rapid Customization for Event Extraction

Arxiv

7+阅读 · 2018年9月20日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Global Relation Embedding for Relation Extraction

Arxiv

10+阅读 · 2018年4月19日

Correlated discrete data generation using adversarial training

Arxiv

5+阅读 · 2018年4月3日

Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Arxiv

4+阅读 · 2017年12月23日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【清华大学-腾讯】关系提取综述，Review and Outlook for Relation Extraction

【清华大学-腾讯】关系提取综述，Review and Outlook for Relation Extraction

专知会员服务

38+阅读 · 2020年4月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

From Data to Model Programming: Injecting Structured Priors for Knowledge Extraction，南加州大学计算机科学系任翔助理教授，CIPS ATT 16（2019）

专知会员服务

14+阅读 · 2019年10月25日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

《美陆军近战整合企业现代化计划（2025—2026）》最新报告

以色列-伊朗空战：短暂而激烈冲突的启示

《动态作战支援演习框架构建》80页

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

【论文】Awesome Relation Extraction Paper（关系抽取）（PART III）

AINLP

25+阅读 · 2019年8月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

论文浅尝 | Zero-Shot Transfer Learning for Event Extraction

论文浅尝 | Zero-Shot Transfer Learning for Event Extraction

开放知识图谱

26+阅读 · 2018年11月1日

春节充电系列：李宏毅2017机器学习课程学习笔记19之迁移学习（Transfer Learning）

春节充电系列：李宏毅2017机器学习课程学习笔记19之迁移学习（Transfer Learning）

专知

9+阅读 · 2018年3月5日

论文浅尝 | Reinforcement Learning for Relation Classification

论文浅尝 | Reinforcement Learning for Relation Classification

开放知识图谱

9+阅读 · 2017年12月10日

相关论文

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Arxiv

5+阅读 · 2019年11月1日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Open Domain Event Extraction Using Neural Latent Variable Models

Open Domain Event Extraction Using Neural Latent Variable Models

Arxiv

4+阅读 · 2019年6月17日

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning

Arxiv

21+阅读 · 2018年12月25日

Rapid Customization for Event Extraction

Rapid Customization for Event Extraction

Arxiv

7+阅读 · 2018年9月20日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

Global Relation Embedding for Relation Extraction

Arxiv

10+阅读 · 2018年4月19日

Correlated discrete data generation using adversarial training

Arxiv

5+阅读 · 2018年4月3日

Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Arxiv

4+阅读 · 2017年12月23日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

微信扫码咨询专知VIP会员