Recent work has reemphasized the importance of cardinality estimates for query optimization. While new techniques have continuously improved in accuracy over time, they still generally allow for under-estimates which often lead optimizers to make overly optimistic decisions. This can be very costly for expensive queries. An alternative approach to estimation is cardinality bounding, also called pessimistic cardinality estimation, where the cardinality estimator provides guaranteed upper bounds of the true cardinality. By never underestimating, this approach allows the optimizer to avoid potentially inefficient plans. However, existing pessimistic cardinality estimators are not yet practical: they use very limited statistics on the data, and cannot handle predicates. In this paper, we introduce SafeBound, the first practical system for generating cardinality bounds. SafeBound builds on a recent theoretical work that uses degree sequences on join attributes to compute cardinality bounds, extends this framework with predicates, introduces a practical compression method for the degree sequences, and implements an efficient inference algorithm. Across four workloads, SafeBound achieves up to 80% lower end-to-end runtimes than PostgreSQL, and is on par or better than state of the art ML-based estimators and pessimistic cardinality estimators, by improving the runtime of the expensive queries. It also saves up to 500x in query planning time, and uses up to 6.8x less space compared to state of the art cardinality estimation methods.
翻译:最近的工作再次强调了对查询优化的根本性估计的重要性。 虽然新技术在一段时间内不断提高准确性, 但通常仍然允许低估, 通常导致优化者做出过于乐观的决定。 对于昂贵的查询来说, 这可能非常昂贵。 替代估算的方法是基度, 也称为悲观的基度估计, 其中基度估计者提供了真实基度的保障上限。 这种方法从未低估, 使得最优化者能够避免潜在的低效率计划。 但是, 目前悲观的基度估计者仍然不切实际: 它们使用的数据统计数据非常有限, 并且无法处理上游数据。 在本文中, 我们引入安全Bound, 这是产生基度约束的第一个实用系统。 安全Bound 以最近的一项理论工作为基础, 使用度序列来连接基度约束的属性, 扩展了这个框架, 引入了一种实用的压缩方法, 避免了可能低效率的计划。 在四个工作量中, 安全基点在数据上使用极低的80%的数据, 并且无法处理上游的直径直径的直径, 也通过运行的直径直径直到最短的直径直到最短的精确的测距方法, 。