Knowledge graphs (KGs) have increasingly become the backbone of many critical knowledge-centric applications. Most large-scale KGs used in practice are automatically constructed based on an ensemble of extraction techniques applied over diverse data sources. Therefore, it is important to establish the provenance of results for a query to determine how these were computed. Provenance is shown to be useful for assigning confidence scores to the results, for debugging the KG generation itself, and for providing answer explanations. In many such applications, certain queries are registered as standing queries since their answers are needed often. However, KGs keep continuously changing due to reasons such as changes in the source data, improvements to the extraction techniques, refinement/enrichment of information, and so on. This brings us to the issue of efficiently maintaining the provenance polynomials of complex graph pattern queries for dynamic and large KGs instead of having to recompute them from scratch each time the KG is updated. Addressing these issues, we present HUKA which uses provenance polynomials for tracking the derivation of query results over knowledge graphs by encoding the edges involved in generating the answer. More importantly, HUKA also maintains these provenance polynomials in the face of updates---insertions as well as deletions of facts---to the underlying KG. Experimental results over large real-world KGs such as YAGO and DBpedia with various benchmark SPARQL query workloads reveals that HUKA can be almost 50 times faster than existing systems for provenance computation on dynamic KGs.
翻译:知识图表(KGs)日益成为许多以知识为中心的关键应用的支柱。在实践上,大多数大型KGs都是根据对不同数据源应用的各种提取技术的统合性自动构建的。因此,重要的是要为查询确定结果的来源,以确定如何计算这些结果。 证明证明对于给结果分配信任分数、调试 KG 一代本身和提供答案解释非常有用。在许多这类应用中,某些查询被登记为长期查询,因为往往需要这些查询的答案。然而,由于源数据的变化、提取技术的改进、信息的精细化/丰富等原因,KGs 继续不断变化。这使我们需要高效率地维持动态和大型 KGs 的复杂图表查询的源代码,而不是每次更新KGG 时都要重新拼图。 解决这些问题时,我们用源代码聚合卡来追踪知识图表的衍生结果,因为KGG值的边端几乎是KLG值,因此KG值的直径可以将KG值的直径对KG值的直径进行实时更新。