Time-series data has an increasingly growing usage in Industrial Internet of Things (IIoT) and large-scale scientific experiments. Managing time-series data needs a storage engine that can keep up with their constantly growing volumes while providing an acceptable query latency. While traditional ACID databases favor consistency over performance, many time-series databases with novel storage engines have been developed to provide better ingestion performance and lower query latency. To understand how the unique design of a time-series database affects its performance, we design SciTS, a highly extensible and parameterizable benchmark for time-series data. The benchmark studies the data ingestion capabilities of time-series databases especially as they grow larger in size. It also studies the latencies of 5 practical queries from the scientific experiments use case. We use SciTS to evaluate the performance of 4 databases of 4 distinct storage engines: ClickHouse, InfluxDB, TimescaleDB, and PostgreSQL.
翻译:时间序列数据在物质工业互联网(IIoT)和大规模科学实验中的使用越来越多。管理时间序列数据需要一个存储引擎,能够跟上不断增长的数量,同时提供一个可接受的查询缓冲。传统的ACID数据库有利于一致性而不是性能。虽然传统的ACID数据库有利于一致性,但许多具有新型存储引擎的时间序列数据库已经开发出来,以提供更好的摄取性能和较低的查询缓冲。为了了解时间序列数据库的独特设计如何影响其性能,我们设计了SciTS,这是时间序列数据的高度可扩展和可参数性基准。基准研究时间序列数据库的数据吸收能力,特别是随着时间序列数据库规模的扩大。它也研究了科学实验使用案例中5个实际查询的迟误。我们使用SciTS来评价4个不同存储引擎数据库的性能:ClickHous、InflusDB、时间尺度DB和PostgreSQL。