自自参加无连结查询的统一可靠性 (Uniform Reliability of Self-Join-Free Conjunctive Queries)

from arxiv, 27 pages including 17 pages of main text. Integrates all reviewer feedback. Outside of some minor formatting differences and tweaks, this paper is the same as the ICDT'21 paper with the addition of 10 pages of technical appendix

The reliability of a Boolean Conjunctive Query (CQ) over a tuple-independent probabilistic database is the probability that the CQ is satisfied when the tuples of the database are sampled one by one, independently, with their associated probability. For queries without self-joins (repeated relation symbols), the data complexity of this problem is fully characterized in a known dichotomy: reliability can be computed in polynomial time for hierarchical queries, and is #P-hard for non-hierarchical queries. Hierarchical queries also characterize the tractability of queries for other tasks: having read-once lineage formulas, supporting insertion/deletion updates to the database in constant time, and having a tractable computation of tuples' Shapley and Banzhaf values. In this work, we investigate a fundamental counting problem for CQs without self-joins: how many sets of facts from the input database satisfy the query? This is equivalent to the uniform case of the query reliability problem, where the probability of every tuple is required to be 1/2. Of course, for hierarchical queries, uniform reliability is in polynomial time, like the reliability problem. However, it is an open question whether being hierarchical is necessary for the uniform reliability problem to be in polynomial time. In fact, the complexity of the problem has been unknown even for the simplest non-hierarchical CQs without self-joins. We solve this open question by showing that uniform reliability is #P-complete for every non-hierarchical CQ without self-joins. Hence, we establish that being hierarchical also characterizes the tractability of unweighted counting of the satisfying tuple subsets. We also consider the generalization to query reliability where all tuples of the same relation have the same probability, and give preliminary results on the complexity of this problem.

翻译：Boolean Conjective Query (CQ) 的可靠性是一个已知的二分法, 这个问题的数据复杂性可以用多级查询来计算, 并且用于非等级性查询。等级性查询也代表了其它任务的查询的可感性: 具有读取性直线公式, 支持插入/删除更新数据库的常数, 支持插入/ 更新数据库, 且具有相关概率。对于没有自我join( 重现关系符号) 的查询, 这个问题的数据复杂性完全以已知的二分法来描述: 可以用多级时间来计算等级查询的可靠性, 并且用 # We- 硬质性查询的复杂性来计算。等级性查询的可感知性也代表了其它任务的可感性: 读取性直径直线线线线公式的可靠性, 支持插入/ 删除数据库的直径直线线线线线直线线线线线线线线线公式, 直径直的自定义的可靠性, 直径直径直径直径直径直径直的直径直径直的, 。