In recent years, emerging storage hardware technologies have focused on divergent goals: better performance or lower cost-per-bit. Correspondingly, data systems that employ these technologies are typically optimized either to be fast (but expensive) or cheap (but slow). We take a different approach: by architecting a storage engine to natively utilize two tiers of fast and low-cost storage technologies, we can achieve a Pareto-efficient balance between performance and cost-per-bit. This paper presents the design and implementation of PrismDB, a novel key-value store that exploits two extreme ends of the spectrum of modern NVMe storage technologies (3D XPoint and QLC NAND) simultaneously. Our key contribution is how to efficiently migrate and compact data between two different storage tiers. Inspired by the classic cost-benefit analysis of log cleaning, we develop a new algorithm for multi-tiered storage compaction that balances the benefit of reclaiming space for hot objects in fast storage with the cost of compaction I/O in slow storage. Compared to the standard use of RocksDB on flash in datacenters today, PrismDB's average throughput on tiered storage is 3.3$\times$ faster and its read tail latency is 2$\times$ better, using equivalently-priced hardware.
翻译:近年来,新兴的储存硬件技术侧重于不同的目标:改善性能或降低成本。相应地,使用这些技术的数据系统通常最优化,要么快速(但昂贵),要么廉价(但缓慢)。我们采取不同的做法:通过设计一个储存引擎,本地使用两层快速和低成本储存技术,我们可以在性能和成本-一位之间实现Pareto高效平衡。本文介绍了PrismDB的设计和实施,这是一个新型的关键价值商店,它同时开发了现代NVME储存技术(3D XPoint和QLC NAND)的两端。我们的主要贡献是如何在两个不同的储存层之间有效地迁移和压缩数据。在对日志清理的典型成本效益分析的启发下,我们为多层储存压缩技术开发了一种新的算法,将快速储存中热物体的回收空间的好处与慢储存中压缩一/O的成本相平衡。与今天在数据中心闪光中标准使用RocksDB的标准使用值(3DPrismDG$)相比, PrisismDB的平比值平均读数值为3.3。