We consider the problem of finding the minimal-size factorization of the provenance of self-join-free conjunctive queries, i.e., we want to find an equivalent propositional formula that minimizes the number of variable occurrences. Our work is partly motivated from probabilistic inference where read-once formulas are known to allow exact PTIME solutions and non-read-once formulas allow approximate solutions with an error that depends on the number of repetitions of variables. We embark on the challenge of characterizing the data complexity of this problem and show its connection to the query resilience problem. While the problem is NP-complete in general, we develop an encoding as max-flow problem that is guaranteed to give the exact solution for several queries (and otherwise approximate minimizations). We show that our encoding is guaranteed to return a read-once factorization if it exists. Our problem and approach is a complete solution that naturally recovers exact solutions for all known PTIME cases, as well as identifying additional queries for which the problem can be solved in PTIME.
翻译:我们考虑的是找到无自join合并质问来源的最小程度因素的问题,即我们想找到一个同等的假设公式,最大限度地减少可变事件的数量。我们的工作部分是出于概率推论,因为已知的读数公式允许精确的PTIME解决方案和非读数公式允许近似解决方案,其错误取决于变量的重复次数。我们着手应对这一问题的数据复杂性的定性挑战,并表明其与查询复原力问题的联系。虽然问题一般是NP-完整的,但我们将编码问题发展成最大流量问题,保证为若干查询提供准确的解决方案(或接近最小化 )。我们表明,如果存在的话,我们的编码保证返回一个可读数系数。我们的问题和方法是一个完整的解决方案,自然恢复所有已知的PTIME案例的准确解决方案,并查明在PTIME中可以解决问题的其他问题。