Probabilistic databases (PDBs) model uncertainty in data in a quantitative way. In the established formal framework, probabilistic (relational) databases are finite probability spaces over relational database instances. This finiteness can clash with intuitive query behavior (Ceylan et al., KR 2016), and with application scenarios that are better modeled by continuous probability distributions (Dalvi et al., CACM 2009). We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a primary focus on countably infinite spaces. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. We argue that finite point processes are an appropriate model from probability theory for dealing with general probabilistic databases. This allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries.
翻译:在既定的正式框架内,概率(关系)数据库是关系数据库实例的有限概率空间。这种有限性可能与直觉查询行为(Ceylan等人, KR 2016年)相冲突,也与以连续概率分布为更好的模型的应用情景相冲突(Dalvi等人, CACM) 。我们正式在(Grohe和Lindner, PODS 2019) 中引入无限的PDB, 主要侧重于可计算无限空间。然而,超出可计算概率空间的扩展会引发与事件和查询的可衡量性有关的非三角基本问题,并最终与询问是否具有明确界定的语义学问题相冲突。我们说,从概率理论看,定点进程是处理一般概率数据库的适当模型。这使我们能够系统地构建合适的(不可计算)数据库概率空间。我们的主要技术成果是用于关系变数查询以及汇总查询和数据查询的可计量性说明。