The role of uncertainty in data management has become more prominent than ever before, especially because of the growing importance of machine learning-driven applications that produce large uncertain databases. A well-known approach to querying such databases is to blend rule-based reasoning with uncertainty. However, techniques proposed so far struggle with large databases. In this paper, we address this problem by presenting a new technique for probabilistic reasoning that exploits Trigger Graphs (TGs) -- a notion recently introduced for the non-probabilistic setting. The intuition is that TGs can effectively store a probabilistic model by avoiding an explicit materialization of the lineage and by grouping together similar derivations of the same fact. Firstly, we show how TGs can be adapted to support the possible world semantics. Then, we describe techniques for efficiently computing a probabilistic model, and formally establish the correctness of our approach. We also present an extensive empirical evaluation using a prototype called LTGs. Our comparison against other leading engines shows that LTGs is not only faster, even against approximate reasoning techniques, but can also reason over probabilistic databases that existing engines cannot scale to.
翻译:随着机器学习驱动的应用程序生成大量不确定性数据库,不确定性在数据管理中的作用变得比以往更加突出。已知的一种查询这种数据库的方法是将基于规则的推理与不确定性结合起来。然而,目前为止提出的技术在处理大型数据库时存在困难。本文通过介绍一种新的概率推理技术来解决这个问题,这种技术利用触发图(TGs)——一种近期针对非概率性问题引入的概念。直观地说,TGs 可以通过避免显式材料化谱系和将同一事实的相似推导组合在一起来有效地存储概率模型。首先,我们展示了 TGs 如何适应可行世界语义。然后,我们描述了有效计算概率模型的技术,并且正式确立了我们的方法的正确性。我们还使用一个名为 LTGs 的原型进行了广泛的实证评估。我们与其他领先引擎的比较表明,即使是相对于近似推理技术,LTGs 不仅更快,而且还可以推理其他引擎无法扩展到的概率数据库。