In this work, we study the problem of computing a tuple's expected multiplicity over probabilistic databases with bag semantics (where each tuple is associated with a multiplicity) exactly and approximately. We consider bag-TIDBs where we have a bound $c$ on the maximum multiplicity of each tuple and tuples are independent probabilistic events (we refer to such databases as c-TIDBs. We are specifically interested in the fine-grained complexity of computing expected multiplicities and how it compares to the complexity of deterministic query evaluation algorithms -- if these complexities are comparable, it opens the door to practical deployment of probabilistic databases. Unfortunately, our results imply that computing expected multiplicities for c-TIDBs based on the results produced by such query evaluation algorithms introduces super-linear overhead (under parameterized complexity hardness assumptions/conjectures). We proceed to study approximation of expected result tuple multiplicities for positive relational algebra queries ($RA^+$) over c-TIDBs and for a non-trivial subclass of block-independent databases (BIDBs). We develop a sampling algorithm that computes a 1$\pm\epsilon$ approximation of the expected multiplicity of an output tuple in time linear in the runtime of the corresponding deterministic query for any $RA^+$ query.
翻译:在这项工作中,我们研究用包装语义学(每个图文都与多重性相关)来计算图普尔预期的多种概率数据库的准确性和大致性。我们认为,在包-TIDBs中,我们在每个图普和图普尔的最大多重性上都有一定的约合美元,这是独立的概率性事件(我们指像c-TIDBs这样的数据库。我们特别感兴趣的是,计算预期的多重性具有细微的复杂性,以及它如何与确定性查询算法的复杂性相比较 -- -- 如果这些复杂性可以比较,它将打开实际部署概率性数据库的大门。不幸的是,我们的结果意味着,根据这种查询算法得出的结果计算出c-TIDBs预期的多重性,将引入超级线性间接性(根据参数的复杂性硬度假设/直观值)。我们开始研究预期的结果图普尔多利度对c-trigical regibal 质查询($美元)相对于C-TC-IDBsqoural roupal subal eximal eximal a bal-BRAmbal-Imbal-BRABlental_BLislationalslationals)中,将开发一个超值的预期的模型数据库(BBBBBBBBBBBBBBBBBBAR_BAR_BAR_BAR_BAR_BAR_BAR)。