从 Web 表格中发现新创实体 (Novel Entity Discovery from Web Tables)

When working with any sort of knowledge base (KB) one has to make sure it is as complete and also as up-to-date as possible. Both tasks are non-trivial as they require recall-oriented efforts to determine which entities and relationships are missing from the KB. As such they require a significant amount of labor. Tables on the Web, on the other hand, are abundant and have the distinct potential to assist with these tasks. In particular, we can leverage the content in such tables to discover new entities, properties, and relationships. Because web tables typically only contain raw textual content we first need to determine which cells refer to which known entities---a task we dub table-to-KB matching. This first task aims to infer table semantics by linking table cells and heading columns to elements of a KB. Then second task builds upon these linked entities and properties to not only identify novel ones in the same table but also to bootstrap their type and additional relationships. We refer to this process as novel entity discovery and, to the best of our knowledge, it is the first endeavor on mining the unlinked cells in web tables. Our method identifies not only out-of-KB (``novel'') information but also novel aliases for in-KB (``known'') entities. When evaluated using three purpose-built test collections, we find that our proposed approaches obtain a marked improvement in terms of precision over our baselines whilst keeping recall stable.

翻译：当与任何类型的知识库(KB)合作时,人们必须确定它是否完整,并且尽可能更新。这两个任务都是非三重性的,因为它们需要回顾性化的努力来确定KB缺少哪些实体和关系。因此,它们需要大量人力。另一方面,网络上的表格是丰富的,具有协助这些任务的独特潜力。特别是,我们可以利用这些表格中的内容发现新的实体、属性和关系。因为网络表格通常只包含原始文本内容,我们首先需要确定哪些细胞指哪些已知实体-任务,即我们将表格对KB的匹配。第一项任务的目的是通过将表格单元格和标题与KB的元素连接起来来推断表格的语义性。接着,第二项任务将这些链接的实体和属性不仅确定同一表格中的新型实体,而且还能够锁定其类型和其他关系。我们称之为新实体的发现,而我们最了解的是,这是在网络表格中首次努力挖掘未连接的单元格,而不是使用新的数据库。我们的方法只是从这些数据库中找到的。我们所知道的“B” 。我们的方法只是从这些实体和“K”测试中找到的标志性数据库中,我们所知道的“B”的“目的”的“工具。我们”在数据库中首先是用来挖掘“我们“我们”的“我们”的“我们”的”的“我们”的“我们”的“我们”的“我们”的“我们”的“我们”的“我们”的“我们”的“我们”的“我们”的”的“新目的”中所使用的方法,而不是只是用来用来用来用来在新目的”的“我们“我们”的“我们”的“我们”的“我们”的“我们”的”的“我们”的”的“我们”的“在“在“我们”的“我们”的“我们”的“我们”的”的”的“我们”中“我们”的”的“在“在“在”中“在”的”的”的”的”中“在“的”的”的”中“在“的”的”中“在“在“的”中“新”中“的”中“的”中“新”中“的”中“新”中“新”的”的“在“新”中“的“的“在“的“的“的”中“的”的”的”中