Cardinality estimation is a fundamental problem in database systems. To capture the rich joint data distributions of a relational table, most of the existing work either uses data as unsupervised information or uses query workload as supervised information. Very little work has been done to use both types of information, and cannot fully make use of both types of information to learn the joint data distribution. In this work, we aim to close the gap between data-driven and query-driven methods by proposing a new unified deep autoregressive model, UAE, that learns the joint data distribution from both the data and query workload. First, to enable using the supervised query information in the deep autoregressive model, we develop differentiable progressive sampling using the Gumbel-Softmax trick. Second, UAE is able to utilize both types of information to learn the joint data distribution in a single model. Comprehensive experimental results demonstrate that UAE achieves single-digit multiplicative error at tail, better accuracies over state-of-the-art methods, and is both space and time efficient.
翻译:红心估计是数据库系统的一个根本问题。为了获取关系表的丰富的联合数据分布,大多数现有工作要么将数据用作不受监督的信息,要么将查询工作量用作受监督的信息。在使用两种类型的信息方面所做的工作很少,无法充分利用这两种类型的信息来学习联合数据分布。在这项工作中,我们的目标是通过提出一个新的统一的深度递增模型UAE来缩小数据驱动和查询驱动方法之间的差距,该模型既能从数据和查询工作量中学习联合数据分布。首先,为了能够利用深层递增模型中受监督的查询信息,我们利用Gumbel-Softmax技术开发了不同的渐进式抽样。第二,UAE能够利用两种类型的信息学习单一模型中的联合数据分布。综合实验结果表明UAE在尾部取得一位数的多倍错误,在最新技术方法上获得更好的理解力,同时具有空间和时间效率。