Given a knowledge base or KB containing (noisy) facts about common nouns or generics, such as "all trees produce oxygen" or "some animals live in forests", we consider the problem of inferring additional such facts at a precision similar to that of the starting KB. Such KBs capture general knowledge about the world, and are crucial for various applications such as question answering. Different from commonly studied named entity KBs such as Freebase, generics KBs involve quantification, have more complex underlying regularities, tend to be more incomplete, and violate the commonly used locally closed world assumption (LCWA). We show that existing KB completion methods struggle with this new task, and present the first approach that is successful. Our results demonstrate that external information, such as relation schemas and entity taxonomies, if used appropriately, can be a surprisingly powerful tool in this setting. First, our simple yet effective knowledge guided tensor factorization approach achieves state-of-the-art results on two generics KBs (80% precise) for science, doubling their size at 74%-86% precision. Second, our novel taxonomy guided, submodular, active learning method for collecting annotations about rare entities (e.g., oriole, a bird) is 6x more effective at inferring further new facts about them than multiple active learning baselines.
翻译:鉴于知识库或KB中包含常见名词或非专利物(俗称)的事实,如“所有树都生产氧气”或“某些动物生活在森林中”等,我们认为,以类似于初始KB的精确度推算更多此类事实的问题。这类KB收集了世界的一般知识,对于各种问题解答等各种应用至关重要。与通常研究的实体KB(如Freebase)不同,通用KB涉及量化,通用KB涉及更复杂的常规,往往更不完整,并违反通常使用的地方封闭世界假设(LCWA)。我们表明,现有的KB完成方法与这一新任务挣扎,并展示了第一个成功的方法。我们的结果表明,外部信息,例如关系计划和实体分类,如果使用得当,可以成为这一环境中一个令人惊讶的强大工具。首先,我们简单而有效的指导数位因子化方法在两种通用的KB(80%精确度)科学上取得了最新的结果,将其规模翻倍于74%-86 %的精确度。我们的新KB的完成方法,即新的税制、更积极的6级学习方法,进一步收集。