Relation extraction (RE) is a sub-discipline of information extraction (IE) which focuses on the prediction of a relational predicate from a natural-language input unit (such as a sentence, a clause, or even a short paragraph consisting of multiple sentences and/or clauses). Together with named-entity recognition (NER) and disambiguation (NED), RE forms the basis for many advanced IE tasks such as knowledge-base (KB) population and verification. In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE by encoding structured information about the sentences' principal units, such as subjects, objects, verbal phrases, and adverbials, into various forms of vectorized (and hence unstructured) representations of the sentences. Our main conjecture is that the decomposition of long and possibly convoluted sentences into multiple smaller clauses via OpenIE even helps to fine-tune context-sensitive language models such as BERT (and its plethora of variants) for RE. Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models compared to existing RE approaches. Our best results reach 92% and 71% of F1 score for KnowledgeNet and FewRel, respectively, proving the effectiveness of our approach on competitive benchmarks.
翻译:在这项工作中,我们探讨了开放信息提取(OpenIE)的最新方法如何有助于改进RE的任务,方法是将关于判决主要单位的结构化信息,例如主题、对象、口头短语和形容词等,编码成不同形式的向量化(并因此没有结构化)的表达方式。我们的主要假设是,通过OpenIEE, 将长期和可能混杂的句子分解成多个较小的条款,例如知识基础(KB)人口和核查。在这项工作中,我们探索了开放信息提取(OpenIE)的最新方法如何有助于改进RE的任务,把关于判决主要单位的结构化信息,例如主题、对象、口头短语和形容词的短段落,编成不同形式的向量化(并因此没有结构化)判决的表达方式。我们的主要假设是,通过OpenIEpenIE(K)人口和核查(Open-BERT)(及其繁多的变式)等对背景敏感的语言模型。我们在两个附加说明的子体、知识网络和微rel1中分别展示了我们现有竞争力排序方法的精确度。