Range aggregate queries find frequent application in data analytics. In some use cases, approximate results are preferred over accurate results if they can be computed rapidly and satisfy approximation guarantees. Inspired by a recent indexing approach, we provide means of representing a discrete point data set by continuous functions that can then serve as compact index structures. More specifically, we develop a polynomial-based indexing approach, called PolyFit, for processing approximate range aggregate queries. PolyFit is capable of supporting multiple types of range aggregate queries, including COUNT, SUM, MIN and MAX aggregates, with guaranteed absolute and relative error bounds. Experiment results show that PolyFit is faster and more accurate and compact than existing learned index structures.
翻译:在一些使用案例中,如果能够快速计算并满足近似保证,则近似结果优于准确结果。在近期的指数化方法的启发下,我们提供了一种手段,以代表由连续功能组成的离散点数据集,这些功能可以作为紧凑的指数结构。更具体地说,我们开发了一种基于多边的指数化方法,称为“聚Fit”,用于处理近似范围的汇总查询。 PolyFit 能够支持多种类型的范围综合查询,包括COUNT、SUM、MIN和MAX,并有保证绝对和相对的误差界限。实验结果表明,PolyFit比现有的已学指数结构更快、更准确、更紧凑。