Probabilistic databases (PDBs) model uncertainty in data. The current standard is to view PDBs as finite probability spaces over relational database instances. Since many attributes in typical databases have infinite domains, such as integers, strings, or real numbers, it is often more natural to view PDBs as infinite probability spaces over database instances. In this paper, we lay the mathematical foundations of infinite probabilistic databases. Our focus then is on independence assumptions. Tuple-independent PDBs play a central role in theory and practice of PDBs. Here, we study infinite tuple-independent PDBs as well as related models such as infinite block-independent disjoint PDBs. While the standard model of PDBs focuses on a set-based semantics, we also study tuple-independent PDBs with a bag semantics and independence in PDBs over uncountable fact spaces. We also propose a new approach to PDBs with an open-world assumption, addressing issues raised by Ceylan et al. (Proc. KR 2016) and generalizing their work, which is still rooted in finite tuple-independent PDBs. Moreover, for countable PDBs we propose an approximate query answering algorithm.
翻译:数据中的概率数据库( PDBs) 模型不确定性。 目前的标准是将 PDBs 视为关系数据库实例的有限概率空间。 由于典型数据库中的许多属性有无限的域, 如整数、字符串或实际数字, 将 PDBs 视为数据库实例的无限概率空间往往比较自然。 在本文中, 我们为无限概率数据库打下数学基础。 然后我们的重点是独立假设。 图人独立的 PDBs 在 PDBs 的理论和实践中发挥着核心作用。 在这里, 我们研究无限的 tuple- 独立 PDBs 以及相关模型, 如无限的块独立脱联 PDBs 。 虽然 PDBs 的标准模型侧重于基于定置的语义学, 但我们也研究图PDBs 的图PDBs, 包语系和 PDBs 独立, 在不可计数的事实空间上。 我们还建议对 PDBs 采用新的方法, 解决Ceylan 等人 等 等 提出的问题(Proc. KR 2016) 和一般化的模型, 他们的作品, 仍然以定数的 Pqolbal- dVDBs 为基础, 。