Sorting and searching are large parts of database query processing, e.g., in the forms of index creation, index maintenance, and index lookup; and comparing pairs of keys is a substantial part of the effort in sorting and searching. We have worked on simple, efficient implementations of decades-old, neglected, effective techniques for fast comparisons and fast sorting, in particular offset-value coding. In the process, we happened upon its mutually beneficial relationship with prefix truncation in run files as well as the duality of compression techniques in row- and column-format storage structures, namely prefix truncation and run-length encoding of leading key columns. We also found a beneficial relationship with consumers of sorted streams, e.g., merging parallel streams, in-stream aggregation, and merge join. We report on our implementation in the context of Google's Napa and F1 Query systems as well as an experimental evaluation of performance and scalability.
翻译:排序和搜索是数据库查询处理的很大一部分内容,例如,以索引创建、索引维护、索引查找等形式;比较钥匙对齐是分拣和搜索工作的一个重要部分。我们致力于简单、高效地实施数十年之久、被忽视、有效的快速比较和快速分类技术,特别是抵消值编码。在这个过程中,我们是在与运行文档中的预缺字以及行式和列式存储结构中压缩技术的双重性,即主要关键列的预联串流和运行长度编码,建立了互利的关系。我们还发现了与分类流的消费者的有益关系,例如,合并平行流、流内聚合和合并。我们报告了在谷歌纳帕和F1Query系统中的执行情况,并对业绩和可缩放性进行了实验性评估。