Real-time analytics systems employ hybrid data layouts in which data are stored in different formats throughout their lifecycle. Recent data are stored in a row-oriented format to serve OLTP workloads and support high insert rates, while older data are transformed to a column-oriented format for OLAP access patterns. We observe that a Log-Structured Merge (LSM) Tree is a natural fit for a lifecycle-aware storage engine due to its high write throughput and level-oriented structure, in which records propagate from one level to the next over time. To build a lifecycle-aware storage engine using an LSM-Tree, we make a crucial modification to allow different data layouts in different levels, ranging from purely row-oriented to purely column-oriented, leading to a Real-Time LSM-Tree. We give a cost model and an algorithm to design a Real-Time LSM-Tree that is suitable for a given workload, followed by an experimental evaluation of LASER - a prototype implementation of our idea built on top of the RocksDB key-value store.
翻译:实时分析系统采用混合数据布局,在其整个生命周期内以不同格式储存数据。最近的数据以面向行的格式储存,以满足OLTP工作量,支持高插入率,而旧数据则转换成以列为导向的OLAP访问模式格式。我们观察到,日志结构合并(LSM)树由于其高写量和级别结构,对寿命周期储存引擎是一种自然的适应性,其中记录从一个层次到下一个层次传播。为了利用LSM-Tree建立一个生命周期记录存储引擎,我们作了重大修改,允许在不同级别,从纯粹以行为导向的到纯粹以列为导向的不同数据布局,导致实时LSM-Tree。我们为设计适合特定工作量的实时LSM-Tree设计了一个成本模型和算法,随后对LASER进行了实验性评价,这是在RocksDG关键价值商店顶部上建立我们构想的原型执行。