LSOIE:监督公开信息提取的大型数据集 (LSOIE: A Large-Scale Dataset for Supervised Open Information Extraction)

Open Information Extraction (OIE) systems seek to compress the factual propositions of a sentence into a series of n-ary tuples. These tuples are useful for downstream tasks in natural language processing like knowledge base creation, textual entailment, and natural language understanding. However, current OIE datasets are limited in both size and diversity. We introduce a new dataset by converting the QA-SRL 2.0 dataset to a large-scale OIE dataset (LSOIE). Our LSOIE dataset is 20 times larger than the next largest human-annotated OIE dataset. We construct and evaluate several benchmark OIE models on LSOIE, providing baselines for future improvements on the task. Our LSOIE data, models, and code are made publicly available

翻译：开放信息提取系统(OIE)试图将句子的事实主张压缩为一系列n-ary tuples。这些图例有助于自然语言处理的下游任务,例如知识库的创建、文本要求和自然语言理解。然而,目前OIE的数据集在大小和多样性方面都有限。我们通过将QA-SRL2.0数据集转换为大规模OIE(LSOIE)数据集引入一个新的数据集。我们的LSOIE数据集比下一个最大的人类附加说明的OIE数据集大20倍。我们为LSOIE建立和评估了若干基准的OIE模型,为今后改进任务提供了基准。我们LSOIE的数据、模型和代码被公开。

相关内容

信息抽取

关注 350

信息抽取（Information Extraction: IE）是把文本里包含的信息进行结构化处理，变成表格一样的组织形式。输入信息抽取系统的是原始文本，输出的是固定格式的信息点。信息点从各种各样的文档中被抽取出来，然后以统一的形式集成在一起。这就是信息抽取的主要任务。信息以统一的形式集成在一起的好处是方便检查和比较。信息抽取技术并不试图全面理解整篇文档，只是对文档中包含相关信息的部分进行分析。至于哪些信息是相关的，那将由系统设计时定下的领域范围而定。

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

42+阅读 · 2020年4月11日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日