A window function is a generalization of the aggregation operation. Unlike aggregation, the cardinality of its output is always the same as the cardinality of input. That is, the semantics of this operator imply computing values for extra attributes for each row, depending on its context, either expressed by a sliding window or a previously evaluated row. Window functions are a very powerful tool, which is also popular among data analysts and supported by the majority of industrial DBMSes. It allows to gracefully express quite complex use-cases, such as running sums and averages, local maximum and minimum, and different types of ranking. Since they can be expressed without self-joins and correlated subqueries, their evaluation can be performed much more efficiently. In this paper we discuss an implementation of window functions inside a disk-based column-store with late materialization. Late materialization is a technique that aims to keep tuple reconstruction back from individual columns as long as possible. Initially popular in the late 00's, it is rarely considered nowadays. However, in case of window functions it allows to substantially lower memory footprint. Another contribution of this paper is the application of a segment tree to computing RANGE-based window functions.
翻译:窗口函数是一个非常强大的工具, 它在数据分析者中也很受欢迎,并得到大多数工业 DBMSs 的支持。 它允许优美地表达相当复杂的使用案例, 如运行量和平均值、本地最大和最低, 以及不同的类型。 由于它们可以不使用自joins和相关的子库来表达, 它们的评价可以更高效地进行。 在本文中, 我们讨论在基于磁盘的专栏存储中执行窗口功能, 并晚化内容化。 晚化内容化是一种技术, 目的是尽可能将图例的重建从单个列中拖回来。 最初在零年代末时, 它很少被看作一种非常复杂的使用案例。 但是, 在窗口功能中, 它允许大大降低记忆足迹。 本文的另一项贡献是, 将一个基于磁盘的专栏存储库用于计算 RANGE 窗口的功能。