Recently, multidimensional data is produced in various domains; because a large volume of this data is often used in complex analytical tasks, it must be stored compactly and able to respond quickly to queries. Existing compression schemes well reduce the data storage; however, they might increase overall computational costs while performing queries. Effectively querying compressed data requires a compression scheme carefully designed for the tasks. This study presents a novel compression scheme, SEACOW, for storing and querying multidimensional array data. The scheme is based on wavelet transform and utilizes a hierarchical relationship between sub-arrays in the transformed data to compress the array. A result of the compression embeds a synopsis, improving query processing performance while acting as an index. To perform experiments, we implemented an array database, SEACOW storage, and evaluated query processing performance on real data sets. Our experiments show that 1) SEACOW provides a high compression ratio comparable to existing compression schemes and 2) the synopsis improves analytical query processing performance.
翻译:最近,多个领域生成了多维数据;由于大量此类数据通常用于复杂的分析任务,因此必须集中储存并能够迅速回答询问。现有的压缩计划可以减少数据存储;然而,它们可能会在进行查询时增加总体计算成本。有效查询压缩数据需要为任务精心设计的压缩计划。本研究提出了一个新的压缩计划,即SEACOW,用于储存和查询多维阵列数据。这个计划基于波盘变,利用变换数据中子阵列之间的等级关系压缩阵列。压缩计划的结果是嵌入一个简要说明,改进查询处理性能,同时作为索引。为了进行实验,我们实施了一个阵列数据库,SEACOW存储,并评估了真实数据集的查询处理性能。我们的实验表明:(1)SEACOW提供了与现有压缩计划相类似的高压缩率,(2) 提要改进分析性查询处理性能。