Sample-based approximate query processing (AQP) suffers from many pitfalls such as the inability to answer very selective queries and unreliable confidence intervals when sample sizes are small. Recent research presented an intriguing solution of combining materialized, pre-computed aggregates with sampling for accurate and more reliable AQP. We explore this solution in detail in this work and propose an AQP physical design called PASS, or Precomputation-Assisted Stratified Sampling. PASS builds a tree of partial aggregates that cover different partitions of the dataset. The leaf nodes of this tree form the strata for stratified samples. Aggregate queries whose predicates align with the partitions (or unions of partitions) are exactly answered with a depth-first search, and any partial overlaps are approximated with the stratified samples. We propose an algorithm for optimally partitioning the data into such a data structure with various practical approximation techniques.
翻译:基于抽样的近似查询处理(AQP)存在许多陷阱,例如无法回答非常有选择的查询,当抽样规模小时信心间隔不可靠。最近的研究提出了一个令人感兴趣的解决办法,即将实际的、预先计算的综合数据与抽样结合,以准确和可靠的AQP。我们在这项工作中详细探讨了这一解决办法,并提议了一个AQP物理设计,称为PASS,或PASS,即Precomplication-Asistication-Asisticed Storent Sampling。PASS建造了一棵包含数据集不同分区的部分聚合物的树。这棵树的叶节点构成分层,作为分层样本的层。综合查询,其前提与分区(或分区结合)完全一致,以深度第一搜索的方式得到准确的回答,任何部分重叠都与分层样本相近。我们提出了一个将数据优化地将数据划入这种数据结构的算法,并采用各种实用的近似技术。