利用多语言开放信息采掘的迭代预测,整合多种采掘途径 (Integrating diverse extraction pathways using iterative predictions for Multilingual Open Information Extraction)

In this paper we investigate a simple hypothesis for the Open Information Extraction (OpenIE) task, that it may be easier to extract some elements of an triple if the extraction is conditioned on prior extractions which may be easier to extract. We successfully exploit this and propose a neural multilingual OpenIE system that iteratively extracts triples by conditioning extractions on different elements of the triple leading to a rich set of extractions. The iterative nature of MiLIE also allows for seamlessly integrating rule based extraction systems with a neural end-to-end system leading to improved performance. MiLIE outperforms SOTA systems on multiple languages ranging from Chinese to Galician thanks to it's ability of combining multiple extraction pathways. Our analysis confirms that it is indeed true that certain elements of an extraction are easier to extract than others. Finally, we introduce OpenIE evaluation datasets for two low resource languages namely Japanese and Galician.

翻译：在本文中,我们调查了开放信息提取(OpenIE)任务的一个简单假设,即如果以先前的提取为条件,而这种提取可能比较容易提取,那么可以更容易地提取一些三重元素。我们成功地利用了这一假设,并提议了一个神经多语言的 OpenIE 系统,通过对导致大量提取的三重元素进行调试,反复提取三重元素。MiLIE的迭接性质还允许将基于规则的提取系统与神经端对端系统进行无缝的整合,从而导致性能的改善。MILIE在从中文到加利西亚语的多种语言上优于SOTA系统,这要归功于它结合多种提取路径的能力。我们的分析证实,某些提取要素确实比其他方法更容易提取。最后,我们为两种低资源语言(日语和加利西亚语)引入了OpenIE评价数据集。

相关内容

信息抽取

关注 350

信息抽取（Information Extraction: IE）是把文本里包含的信息进行结构化处理，变成表格一样的组织形式。输入信息抽取系统的是原始文本，输出的是固定格式的信息点。信息点从各种各样的文档中被抽取出来，然后以统一的形式集成在一起。这就是信息抽取的主要任务。信息以统一的形式集成在一起的好处是方便检查和比较。信息抽取技术并不试图全面理解整篇文档，只是对文档中包含相关信息的部分进行分析。至于哪些信息是相关的，那将由系统设计时定下的领域范围而定。

开放领域知识图谱问答研究综述

专知会员服务

64+阅读 · 2021年10月30日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日