MIRIE: 模块和循环多语言开放信息提取 (milIE: Modular & Iterative Multilingual Open Information Extraction)

Open Information Extraction (OpenIE) is the task of extracting (subject, predicate, object) triples from natural language sentences. Current OpenIE systems extract all triple slots independently. In contrast, we explore the hypothesis that it may be beneficial to extract triple slots iteratively: first extract easy slots, followed by the difficult ones by conditioning on the easy slots, and therefore achieve a better overall extraction. Based on this hypothesis, we propose a neural OpenIE system, milIE, that operates in an iterative fashion. Due to the iterative nature, the system is also modular -- it is possible to seamlessly integrate rule based extraction systems with a neural end-to-end system, thereby allowing rule based systems to supply extraction slots which milIE can leverage for extracting the remaining slots. We confirm our hypothesis empirically: milIE outperforms SOTA systems on multiple languages ranging from Chinese to Arabic. Additionally, we are the first to provide an OpenIE test dataset for Arabic and Galician.

翻译：开放信息提取( OpenIE) 是从自然语言句中提取( 主题、上游、对象) 3 次的任务。当前 OpenIE 系统独立地提取所有3个空格。相反, 我们探索了一种假设, 假设它可能有利于迭接地提取3个空格: 首先提取简易的空格, 其次是困难的空格, 然后在轻松的空格上进行调节, 从而实现更好的全面提取。基于这个假设, 我们提议一个以迭接方式运行的神经开放信息系统( MIIE ) 。由于迭接性, 该系统也是模块化的 -- -- 可以无缝地将基于规则的抽取系统与神经端对端系统整合, 从而允许基于规则的系统供应提取空格, MIE 可以利用这些空格来抽取剩余空格。我们以经验证实了我们的假设: MIIE 将多语言的SOTA系统从中文到阿拉伯文, 。此外, 我们第一个为阿拉伯语和加利西亚提供 Open 测试数据集。

相关内容

信息抽取

关注 350

信息抽取（Information Extraction: IE）是把文本里包含的信息进行结构化处理，变成表格一样的组织形式。输入信息抽取系统的是原始文本，输出的是固定格式的信息点。信息点从各种各样的文档中被抽取出来，然后以统一的形式集成在一起。这就是信息抽取的主要任务。信息以统一的形式集成在一起的好处是方便检查和比较。信息抽取技术并不试图全面理解整篇文档，只是对文档中包含相关信息的部分进行分析。至于哪些信息是相关的，那将由系统设计时定下的领域范围而定。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日