Log-Structured Tables (LSTs), also commonly referred to as table formats, have recently emerged to bring consistency and isolation to object stores. With the separation of compute and storage, object stores have become the go-to for highly scalable and durable storage. However, this comes with its own set of challenges, such as the lack of recovery and concurrency management that traditional database management systems provide. This is where LSTs such as Delta Lake, Apache Iceberg, and Apache Hudi come into play, providing an automatic metadata layer that manages tables defined over object stores, effectively addressing these challenges. A paradigm shift in the design of these systems necessitates the updating of evaluation methodologies. In this paper, we examine the characteristics of LSTs and propose extensions to existing benchmarks, including workload patterns and metrics, to accurately capture their performance. We introduce our framework, LST-Bench, which enables users to execute benchmarks tailored for the evaluation of LSTs. Our evaluation demonstrates how these benchmarks can be utilized to evaluate the performance, efficiency, and stability of LSTs. The code for LST-Bench is open sourced and is available at https://github.com/microsoft/lst-bench/ .
翻译:日志结构表(Log-Structured Tables,也称为表格格式)最近出现,为对象存储带来了一致性和隔离性。随着计算和存储的分离,对象存储已成为高度可扩展和耐久的存储方式。然而,这也带来了一些挑战,例如传统数据库管理系统提供的恢复和并发管理的缺乏。这就是LST如Delta Lake,Apache Iceberg和Apache Hudi的应用场景,它们提供了一个自动的元数据层来管理定义在对象存储上的表格,从而有效地解决了这些挑战。这种设计范式的转变需要更新评估方法。在本文中,我们检查了LST的特性,并提出了对现有基准测试的扩展,包括工作负载模式和指标,以准确捕捉它们的性能。我们介绍了我们的框架LST-Bench,它使用户能够执行针对LST评估定制的基准测试。我们的评估说明了如何使用这些基准测试来评估LST的性能、效率和稳定性。LST-Bench的代码是开源的,可在https://github.com/microsoft/lst-bench/ 上找到。