DetIE: 受物体探测启发的多语言开放信息提取 (DetIE: Multilingual Open Information Extraction Inspired by Object Detection)

State of the art neural methods for open information extraction (OpenIE) usually extract triplets (or tuples) iteratively in an autoregressive or predicate-based manner in order not to produce duplicates. In this work, we propose a different approach to the problem that can be equally or more successful. Namely, we present a novel single-pass method for OpenIE inspired by object detection algorithms from computer vision. We use an order-agnostic loss based on bipartite matching that forces unique predictions and a Transformer-based encoder-only architecture for sequence labeling. The proposed approach is faster and shows superior or similar performance in comparison with state of the art models on standard benchmarks in terms of both quality metrics and inference time. Our model sets the new state of the art performance of 67.7% F1 on CaRB evaluated as OIE2016 while being 3.35x faster at inference than previous state of the art. We also evaluate the multilingual version of our model in the zero-shot setting for two languages and introduce a strategy for generating synthetic multilingual data to fine-tune the model for each specific language. In this setting, we show performance improvement 15% on multilingual Re-OIE2016, reaching 75% F1 for both Portuguese and Spanish languages. Code and models are available at https://github.com/sberbank-ai/DetIE.

翻译：开放信息提取( OpenIE) 的艺术神经神经方法状态( OpenIE) 通常以自动递增或上游方式迭接地提取三胞胎( 或图普尔), 以避免产生重复。在这项工作中, 我们建议了一种不同的方法来应对问题, 可以同样或更成功。也就是说, 我们展示了一种由计算机视觉天体检测算法启发的新的开放神经神经方法。我们使用一种基于双向匹配的排序匹配的顺序不可知性损失, 即强制进行独特的预测, 以及一个仅以变异器为基础的编码器结构, 用于序列标签。提议的方法更快, 并显示优或相似的性能, 与在质量计量和推断时间方面标准基准的艺术模型相比。我们的模型设定了CARBE67. 7% F1 的艺术性能新状态, 被评为 OIE20, 与以前一样快。我们还评估了两种语言的零光谱设置中我们模型的多语种版本, 并引入了生成合成多语种多语种数据的战略, 以微调调点20, 和每种语言的FIE1 15/ ReFIE16 的版本。的版本。我们显示了多种语言的版本, AS SA 。

相关内容

信息抽取

关注 350

信息抽取（Information Extraction: IE）是把文本里包含的信息进行结构化处理，变成表格一样的组织形式。输入信息抽取系统的是原始文本，输出的是固定格式的信息点。信息点从各种各样的文档中被抽取出来，然后以统一的形式集成在一起。这就是信息抽取的主要任务。信息以统一的形式集成在一起的好处是方便检查和比较。信息抽取技术并不试图全面理解整篇文档，只是对文档中包含相关信息的部分进行分析。至于哪些信息是相关的，那将由系统设计时定下的领域范围而定。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日