To monitor critical infrastructure, high quality sensors sampled at a high frequency are increasingly used. However, as they produce huge amounts of data, only simple aggregates are stored. This removes outliers and fluctuations that could indicate problems. As a remedy, we present a model-based approach for managing time series with dimensions that exploits correlation in and among time series. Specifically, we propose compressing groups of correlated time series using an extensible set of model types within a user-defined error bound (possibly zero). We name this new category of model-based compression methods for time series Multi-Model Group Compression (MMGC). We present the first MMGC method GOLEMM and extend model types to compress time series groups. We propose primitives for users to effectively define groups for differently sized data sets, and based on these, an automated grouping method using only the time series dimensions. We propose algorithms for executing simple and multi-dimensional aggregate queries on models. Last, we implement our methods in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our evaluation shows that compared to widely used formats, ModelarDB+ provides up to 13.7 times faster ingestion due to high compression, 113 times better compression due to the adaptivity of GOLEMM, 630 times faster aggregates by using models, and close to linear scalability. It is also extensible and supports online query processing.
翻译:为了监测关键基础设施,越来越多地使用高频抽样的高质量传感器来监测关键基础设施。然而,当它们产生大量数据时,只储存了简单的总量。这可以消除问题。作为一种补救措施,我们提出了一个基于模型的方法来管理时间序列,其尺寸利用的是时间序列中和时间序列之间的相关性。具体地说,我们建议使用一套可扩展的模型类型,在用户定义的受约束的错误(可能为零)中采用一组相关时间序列的压缩组合。我们为时间序列多模调组合(MMMGC)命名了这种基于模型的压缩方法的新类别。我们提出了第一种MMGC GOLEMM方法,并将模型类型扩展至压缩时间序列组。我们建议一种基于模型的原始方法,以便用户有效地界定不同尺寸数据集的组群,并以此为基础,仅使用时间序列维度的自动组合方法。我们提出了在对模型进行简单和多维度的综合查询的算法。最后,我们在时间序列管理系统(TSMS)模型(ModelarDB+)中采用我们的方法。我们的评估显示,相对于广泛使用的更高程度格式,模型DBDB+MDB和精确的精确度为13时间的精确度,我们使用较快的精确的精确度模型为13时间的精确度的精确度为13次的精确度,模型提供。