Open Information Extraction (OIE) methods extract a large number of OIE triples (noun phrase, relation phrase, noun phrase) from text, which compose large Open Knowledge Bases (OKBs). However, noun phrases (NPs) and relation phrases (RPs) in OKBs are not canonicalized and often appear in different paraphrased textual variants, which leads to redundant and ambiguous facts. To address this problem, there are two related tasks: OKB canonicalization (i.e., convert NPs and RPs to canonicalized form) and OKB linking (i.e., link NPs and RPs with their corresponding entities and relations in a curated Knowledge Base (e.g., DBPedia). These two tasks are tightly coupled, and one task can benefit significantly from the other. However, they have been studied in isolation so far. In this paper, we explore the task of joint OKB canonicalization and linking for the first time, and propose a novel framework JOCL based on factor graph model to make them reinforce each other. JOCL is flexible enough to combine different signals from both tasks, and able to extend to fit any new signals. A thorough experimental study over two large scale OIE triple data sets shows that our framework outperforms all the baseline methods for the task of OKB canonicalization (OKB linking) in terms of average F1 (accuracy).
翻译:开放信息提取 (OIE) 方法从包含大型开放知识库( OKBs) 的文本中提取大量 OIEE 3 个词( 名词、 关系短语、 名词) 。 然而, CPBs 中的名词( NPs) 和 关系短语( RPs) 无法被简单化, 并经常出现在不同的文字变体中, 导致多余和模糊的事实。 要解决这个问题, 有两个相关的任务 : OKB Canical化( 即, 将 NPs 和 RPs 转换为 Canonicality 格式) 和 OKB 链接( 将 NPS 和 RPs与其相应实体链接, 在一个精密的知识库( 如 DBBedia ) 中的关系) 。 但是, 这两项任务是紧密结合的, 一项任务可以从不同的文字变式文本变体中, 但是, 它们已经被孤立地研究。 在本文中, 我们探讨 第一次探讨 联合 CPL 和连接起来的任务, 并提议一个基于 系数图形模型的新框架 新的框架,, 使他们能将每一个 的 的 made 连接 的 的 双级 的 的 的 made 。