EQU-joins 之后:排名、计数和定级 (Beyond Equi-joins: Ranking, Enumeration and Factorization)

We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with $n$ denoting the number of tuples in the database, we guarantee for acyclic full join queries with inequality conditions that for every value of $k$, the $k$ top-ranked answers are returned in $O(n \operatorname{polylog} n + k \log k)$ time. This is within a polylogarithmic factor of the best known complexity for equi-joins and even of $\mathcal{O}(n+k)$, the time it takes to look at the input and return $k$ answers in any order. Our guarantees extend to join queries with selections and many types of projections, such as the so-called free-connex queries. Remarkably, they hold even when the entire output is of size $n^\ell$ for a join of $\ell$ relations. The key ingredient is a novel $\mathcal{O}(n \operatorname{polylog} n)$-size factorized representation of the query output, which is constructed on-the-fly for a given query and database. In addition to providing the first non-trivial theoretical guarantees beyond equi-joins, we show in an experimental study that our ranked-enumeration approach is also memory-efficient and fast in practice, beating the running time of state-of-the-art database systems by orders of magnitude.

翻译：我们一般地研究Tata-joins, 并特别以不平等的连结和脱钩方式加入上游, 重点是按排序顺序递归答案的排名计数。我们的方法具有很强的时间和空间复杂性。我们的方法实现了强大的时间和空间复杂性: 以美元来分辨数据库中的 Tuples 数量, 我们保证以不平等条件将整个查询与不平等条件结合, 每值为 $, 最高值的答案以美元( n\ operator name{polylog} n + k\log k) 的时间返回。这属于已知最复杂的equi- jojoin 和甚至$\ mathcal{O} (n+k) $(n+k) 。我们的保证范围扩大到选择和许多类型的预测, 比如所谓的自由Conex 查询。显而易见, 当整个输出为 equal- $_ liver=xxxxxxxxxal- crial_ liveral- exal- ligal- trainal- laismations ex as the keystemations.