Knowledge about entities and their interrelations is a crucial factor of success for tasks like question answering or text summarization. Publicly available knowledge graphs like Wikidata or DBpedia are, however, far from being complete. In this paper, we explore how information extracted from similar entities that co-occur in structures like tables or lists can help to increase the coverage of such knowledge graphs. In contrast to existing approaches, we do not focus on relationships within a listing (e.g., between two entities in a table row) but on the relationship between a listing's subject entities and the context of the listing. To that end, we propose a descriptive rule mining approach that uses distant supervision to derive rules for these relationships based on a listing's context. Extracted from a suitable data corpus, the rules can be used to extend a knowledge graph with novel entities and assertions. In our experiments we demonstrate that the approach is able to extract up to 3M novel entities and 30M additional assertions from listings in Wikipedia. We find that the extracted information is of high quality and thus suitable to extend Wikipedia-based knowledge graphs like DBpedia, YAGO, and CaLiGraph. For the case of DBpedia, this would result in an increase of covered entities by roughly 50%.
翻译:有关实体及其相互关系的知识是诸如问答或文本摘要等任务取得成功的关键因素。 但是,像维基数据或DBpedia这样的公开知识图表远没有完成。 在本文中,我们探索如何从类似实体中获取的信息,这些实体在表格或列表等结构中共同发现,如何有助于扩大这种知识图表的覆盖面。与现有方法相比,我们不侧重于列表内部的关系(例如,一个表格行的两个实体之间的关系),而侧重于列表主题实体与列表内容之间的关系。为此,我们提议采用描述性规则采矿方法,利用远程监督为这些关系制定基于列表背景的规则。从合适的数据库中提取,这些规则可用于扩展与新实体和主张的知识图表。在我们的实验中,我们证明该方法能够提取到3M新实体,并从维基百科的列表中提取30M额外数据。我们发现,所提取的信息质量很高,因此适合扩展基于维基百科的知识图表,如DBpedia、YAGO、CAGRA 和CAGRA 等实体将增加这一结果。