Recent work shows how offset-value coding speeds up database query execution, not only sorting but also duplicate removal and grouping(aggregation) in sorted streams, order-preserving exchange (shuffle), merge join, and more. It already saves thousands of CPUs in Google's Napa and F1 Query systems, e.g., in grouping algorithms and in log-structured merge-forests. In order to realize the full benefit of interesting orderings, however, query execution algorithms must not only consume and exploit offset-value codes but also produce offset-value codes for the next operator in the pipeline. Our research has investigated ways to produce offset-value codes without comparing successive output rows column-by-column. This short paper introduces a new theorem and, based on its proof and a simple corollary, describes in detail how order-preserving algorithms (from filter to merge join and even shuffle) can compute offset-value codes for their outputs. These calculations are surprisingly simple and very efficient.
翻译:最近的工作显示了抵消值编码如何加快数据库查询执行的速度,不仅对分类流进行分类,而且重复清除和分组(聚合),命令保存交换(shuffle),合并合并等等。它已经在Google的 Napa 和 F1 Query 系统中节省了数千个CPU,例如分组算法和日志结构合并-森林。然而,为了实现有趣的订单的全部好处,查询执行算法不仅必须消耗和开发抵消值代码,而且还必须为管道中下一位操作员制作抵消值代码。我们的研究调查了如何在不逐行比较连续输出行逐行的柱状的情况下生成抵消值代码。这份简短的论文引入了新的理论,并基于其证据和简单推论,详细描述订单保存算法(从过滤到合并合并,甚至抖动)如何计算其输出的抵消值代码。这些计算是惊人的简单和非常高效的。