Aggregation in relational databases is accomplished through hashing and sorting interval data, which is computationally expensive and scales poorly as the data volumes grow. In this paper, we show how quantitative interval and time-series data in relational attributes can be represented using delta summary values rather than absolute values. The need for sorting to determine the row corresponding to some maximum timestamp is negated, reducing the time complexity from at least O(n log(n)) towards O(n) and improving query execution times. We illustrate this new method in the relational algebra, present the implementation algorithmically, and test an implementation in two leading RDBMS products against the use of normal equivalents. We found this delta summation technique to be most effective for use cases with additive, numerical data upon which it is necessary to frequently obtain the latest values, and where the row cardinalities are in the order of 10^5. Our experiments found the proposed new delta summation technique could execute faster than the equivalent standard selection method by up to 22.4%, reducing the overall query cost in some circumstances by up to 24.0%, reducing I/O load by up to 60.6%, but with increased query costs for write operations, an increase in CPU time and memory allocation, uncertain performance with very low or very high cardinalities and inconsistent results across different RDBMS platforms.
翻译:关系数据库中的聚合是通过散列和分类间隔数据实现的,这种数据在计算上费用昂贵,且因数据量增长而规模不高。在本文中,我们用三角汇总值而不是绝对值来显示关系属性的定量间隔和时间序列数据。为确定与某种最大时间戳相对应的行而进行排序的必要性被否定,将至少O(nlog(n))到O(n)的时间复杂性降低到O(n),并改进查询执行时间。我们用关系代数显示这种新方法,从算法的角度介绍执行过程,并用正常等值来测试两个领先的 RDBMS 产品的执行情况。我们发现,这种三角汇总技术对于使用添加性、数字性数据以便经常获得最新值的情况最为有效,而当行的偏差从至少O(n)到O(n)到O(n), 并改进查询执行时间间隔时间间隔时间。我们发现,拟议的新的三角汇总技术可以比同等的标准选择方法执行速度更快,达到22.4%,将总体查询费用降低到24.0%,但在某些情况下,使用正常等同的R/O(x)计算结果则会提高,使I/O(r)运行)运行的不固定的运行成本增加至60.6%。