Recent work has demonstrated the catastrophic effects of poor cardinality estimates on query processing time. In particular, underestimating query cardinality can result in overly optimistic query plans which take orders of magnitude longer to complete than one generated with the true cardinality. Cardinality bounding avoids this pitfall by computing a strict upper bound on the query's output size using statistics about the database such as table sizes and degrees, i.e. value frequencies. In this paper, we extend this line of work by proving a novel bound called the Degree Sequence Bound which takes into account the full degree sequences and the max tuple multiplicity. This bound improves upon previous work incorporating degree constraints which focused on the maximum degree rather than the degree sequence. Further, we describe how to practically compute this bound using a learned approximation of the true degree sequences.
翻译:最近的工作表明,对查询处理时间的偏重性估计不足,产生了灾难性的影响。特别是,低估查询的偏重性可能导致过于乐观的查询计划,其数量级要长于真正基点产生的数量级才能完成。红心型的界限通过使用表格大小和度(即价值频率)等数据库统计数据,在查询输出大小上方计算严格的上限,避免了这一陷阱。在本文件中,我们通过证明一个新颖的界限,即 " 程度序列圈 ",其中考虑到全度序列和最大体积的多重性,从而扩展了这一工程线。这一界限随着先前的工作而有所改进,它包含了以最大度而不是度序列为重点的程度限制。此外,我们描述了如何使用对真实度序列的精明近似度来实际计算这一约束。