Statistical models of real world data typically involve continuous probability distributions such as normal, Laplace, or exponential distributions. Such distributions are supported by many probabilistic modelling formalisms, including probabilistic database systems. Yet, the traditional theoretical framework of probabilistic databases focusses entirely on finite probabilistic databases. Only recently, we set out to develop the mathematical theory of infinite probabilistic databases. The present paper is an exposition of two recent papers which are cornerstones of this theory. In (Grohe, Lindner; ICDT 2020) we propose a very general framework for probabilistic databases, possibly involving continuous probability distributions, and show that queries have a well-defined semantics in this framework. In (Grohe, Kaminski, Katoen, Lindner; PODS 2020) we extend the declarative probabilistic programming language Generative Datalog, proposed by (B\'ar\'any et al.~2017) for discrete probability distributions, to continuous probability distributions and show that such programs yield generative models of continuous probabilistic databases.
翻译:真实世界数据的统计模型通常包含连续的概率分布,如正常的、Laplace或指数分布。这种分布得到许多概率建模形式学支持,包括概率数据库系统。然而,概率数据库的传统理论框架完全侧重于有限的概率数据库。直到最近,我们才开始开发无限概率数据库的数学理论。本文件展示了作为这一理论基石的两个最新论文。在(Grohe, Lindner; ICDT 2020)中,我们为概率数据库提出了一个非常笼统的框架,可能涉及连续概率分布,并显示查询在这个框架中有一个明确界定的语义学。在(Grohe, Kaminski, Katoen, Lindner; PODS 2020)中,我们扩展了(B\\ar\'any et al. ~ 2017) 提议的关于离散概率分布的宣示性概率数据仪,以持续概率分布,并显示这种程序产生连续概率数据库的基因化模型。