The chase is a well-established family of algorithms used to materialize Knowledge Bases (KBs), like Knowledge Graphs (KGs), to tackle important tasks like query answering under dependencies or data cleaning. A general problem of chase algorithms is that they might perform redundant computations. To counter this problem, we introduce the notion of Trigger Graphs (TGs), which guide the execution of the rules avoiding redundant computations. We present the results of an extensive theoretical and empirical study that seeks to answer when and how TGs can be computed and what are the benefits of TGs when applied over real-world KBs. Our results include introducing algorithms that compute (minimal) TGs. We implemented our approach in a new engine, and our experiments show that it can be significantly more efficient than the chase enabling us to materialize KBs with 17B facts in less than 40 min on commodity machines.
翻译:追逐是用来实现知识库(KBs)的一套既定的算法,例如知识图(KGs),用来处理重要任务,例如依赖性或数据清理下的查询回答。追逐算法的一个普遍问题是它们可能进行多余的计算。为了解决这个问题,我们引入了Tigger 图形(TGs)的概念,用以指导规则的执行,避免重复计算。我们展示了广泛的理论和经验研究的结果,它寻求答案,即何时以及如何计算TGs,在应用到现实世界的KBs时TGs的好处是什么。我们的结果包括引入计算(最小)TGs的算法。我们在新的引擎中应用了我们的方法,我们的实验表明,它比追逐能让我们在商品机器上用不到40分钟的17B事实实现KBs效率要高得多。