Information Extraction (IE) refers to automatically extracting structured relation tuples from unstructured texts. Common IE solutions, including Relation Extraction (RE) and open IE systems, can hardly handle cross-sentence tuples, and are severely restricted by limited relation types as well as informal relation specifications (e.g., free-text based relation tuples). In order to overcome these weaknesses, we propose a novel IE framework named QA4IE, which leverages the flexible question answering (QA) approaches to produce high quality relation triples across sentences. Based on the framework, we develop a large IE benchmark with high quality human evaluation. This benchmark contains 293K documents, 2M golden relation triples, and 636 relation types. We compare our system with some IE baselines on our benchmark and the results show that our system achieves great improvements.
翻译:信息提取(IE) 指的是从无结构文本中自动提取结构化关系图例。 共同的 IE 解决方案,包括关系提取(RE)和开放的 IE 系统,几乎无法处理交叉判决图例,并且受到有限关系类型和非正式关系规格的严格限制(例如,基于自由文本的关系图例 ) 。 为了克服这些缺陷,我们建议了一个名为 QA4IE 的新颖 IE 框架, 利用灵活回答问题(QA) 的方法, 产生高质量的三重徒刑关系。 根据这个框架, 我们开发了一个大型 IE 基准, 高质量的人类评价。 这个基准包含293K 文件, 2M 金关系三重, 636 个关系类型。 我们比较了我们的系统与我们基准上的一些IE 基准基准,结果显示我们的系统取得了很大的改进。