State of the art neural methods for open information extraction (OpenIE) usually extract triplets (or tuples) iteratively in an autoregressive or predicate-based manner in order not to produce duplicates. In this work, we propose a different approach to the problem that can be equally or more successful. Namely, we present a novel single-pass method for OpenIE inspired by object detection algorithms from computer vision. We use an order-agnostic loss based on bipartite matching that forces unique predictions and a Transformer-based encoder-only architecture for sequence labeling. The proposed approach is faster and shows superior or similar performance in comparison with state of the art models on standard benchmarks in terms of both quality metrics and inference time. Our model sets the new state of the art performance of 67.7% F1 on CaRB evaluated as OIE2016 while being 3.35x faster at inference than previous state of the art. We also evaluate the multilingual version of our model in the zero-shot setting for two languages and introduce a strategy for generating synthetic multilingual data to fine-tune the model for each specific language. In this setting, we show performance improvement 15% on multilingual Re-OIE2016, reaching 75% F1 for both Portuguese and Spanish languages. Code and models are available at https://github.com/sberbank-ai/DetIE.
翻译:开放信息提取( OpenIE) 的艺术神经神经方法状态( OpenIE) 通常以自动递增或上游方式迭接地提取三胞胎( 或图普尔), 以避免产生重复。 在这项工作中, 我们建议了一种不同的方法来应对问题, 可以同样或更成功。 也就是说, 我们展示了一种由计算机视觉天体检测算法启发的新的开放神经神经方法。 我们使用一种基于双向匹配的排序匹配的顺序不可知性损失, 即强制进行独特的预测, 以及一个仅以变异器为基础的编码器结构, 用于序列标签。 提议的方法更快, 并显示优或相似的性能, 与在质量计量和推断时间方面标准基准的艺术模型相比。 我们的模型设定了CARBE67. 7% F1 的艺术性能新状态, 被评为 OIE20, 与以前一样快。 我们还评估了两种语言的零光谱设置中我们模型的多语种版本, 并引入了生成合成多语种多语种数据的战略, 以微调调点20, 和每种语言的FIE1 15/ ReFIE16 的版本。 的版本。 我们显示了多种语言的版本, AS SA 。