Recent beyond worst-case optimal join algorithms Minesweeper and its generalization Tetris have brought the theory of indexing and join processing together by developing a geometric framework for joins. These algorithms take as input an index $\mathcal{B}$, referred to as a box cover, that stores output gaps that can be inferred from traditional indexes, such as B+ trees or tries, on the input relations. The performances of these algorithms highly depend on the certificate of $\mathcal{B}$, which is the smallest subset of gaps in $\mathcal{B}$ whose union covers all of the gaps in the output space of a query $Q$. We study how to generate box covers that contain small size certificates to guarantee efficient runtimes for these algorithms. First, given a query $Q$ over a set of relations of size $N$ and a fixed set of domain orderings for the attributes, we give a $\tilde{O}(N)$-time algorithm called GAMB which generates a box cover for $Q$ that is guaranteed to contain the smallest size certificate across any box cover for $Q$. Second, we show that finding a domain ordering to minimize the box cover size and certificate is NP-hard through a reduction from the 2 consecutive block minimization problem on boolean matrices. Our third contribution is a $\tilde{O}(N)$-time approximation algorithm called ADORA to compute domain orderings, under which one can compute a box cover of size $\tilde{O}(K^r)$, where $K$ is the minimum box cover for $Q$ under any domain ordering and $r$ is the maximum arity of any relation. This guarantees certificates of size $\tilde{O}(K^r)$. We combine ADORA and GAMB with Tetris to form a new algorithm we call TetrisReordered, which provides several new beyond worst-case bounds. On infinite families of queries, TetrisReordered's runtimes are unboundedly better than the bounds stated in prior work.
翻译:超越了最近最坏情况的最佳合并算法 { 扫雷者 及其通用化 Tetris 通过开发组合的几何框架, 带来了索引和合并处理的理论。 这些算法将输入一个 $\ mathcal{B} 美元, 被称为一个盒子封面, 存储从传统指数( 如 B+ 树或尝试) 中推导出来的输出缺口。 这些算法的性能高度取决于 $\ mathcal{ B} 及其通用化 斯特里尔 的证书 。 这是美元=mathcal{B} 美元中最小差距的一组。 这些算法将包含所有输出空间中的所有缺口。 我们研究如何生成包含小范围证书的框 $\ mathcal{B} 用于保证这些算法的有效运行时间 。 首先, 如果从一个大小为美元和固定的域内排序, 我们给出了 $tretiread{O} (N) 美元-tiretime 算算算算算算算算算算算算出新的一个框, 保证包含最小范围的 美元 美元 美元 Qrealderealr= kdereax 。