Query optimizers rely on accurate cardinality estimation (CardEst) to produce good execution plans. The core problem of CardEst is how to model the rich joint distribution of attributes in an accurate and compact manner. Despite decades of research, existing methods either over simplify the models only using independent factorization which leads to inaccurate estimates, or over complicate them by lossless conditional factorization without any independent assumption which results in slow probability computation. In this paper, we propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in model size and accurate in estimation quality. The key idea of FLAT is a novel unsupervised graphical model, called FSPN. It utilizes both independent and conditional factorization to adaptively model different levels of attributes correlations, and thus dovetails their advantages. FLAT supports efficient online probability computation in near liner time on the underlying FSPN model, provides effective offline model construction and enables incremental model updates. It can estimate cardinality for both single table queries and multi table join queries. Extensive experimental study demonstrates the superiority of FLAT over existing CardEst methods on well known IMDB benchmarks: FLAT achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability computation speed and 1 to 2 orders of magnitude lower storage cost. We also integrate FLAT into Postgres to perform an end to end test. It improves the query execution time by 12.9% on the benchmark workload, which is very close to the optimal result 14.2% using the true cardinality.
翻译:查询优化器依赖准确的基本估计值( CardEst) 来制定良好的执行计划。 CardEst 的核心问题是如何以准确和紧凑的方式模拟丰富的共同属性分布。 尽管进行了数十年的研究, 现有的方法要么是仅使用独立的因数化简化模型,导致不准确的估计数,要么是在没有任何独立假设的情况下通过无损失的有条件因数化使模型复杂化,从而导致计算速度缓慢的概率计算。 在本文中, 我们提议FLAT, 即卡斯特方法, 在概率计算、 模型大小和准确的估算质量方面同时使用。 FLAT 的关键想法是一个新的不受监督的图形模型, 称为 FSPN。 它使用独立和有条件的因数化因数化方法, 以适应性的方式模拟不同的属性相关关系, 从而得出不准确的参数。 FLAT 支持在FPN模型的近线时间里进行高效的在线概率计算, 提供有效的离线模型构建, 并允许渐进式模型更新。 它可以通过查询来估计单表查询和多表的基点。 广泛的实验研究表明, FLAT 优于现有的卡斯特的接近的图形模型的图形模型, 方法的优于现有不易用FMDB press 2 级的精确度, 在精确度中, 的精确度中, 的精确度将1级中, 级的精确度为我们测算算为1 的精确度为1 的精确度为1 级的精确度为1 级, 。